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Preface 


The preface to the first edition of this text explained our mission as follows: 

This textbook is organized around the principle that much of actuarial 
science consists of the construction and analysis of mathematical models which 
describe the process by which funds flow into and out of an insurance system. 
An analysis of the entire system is beyond the scope of a single text, so we 
have concentrated our efforts on the loss process, that is, tlie outflow of cash 
due to the payment of benefits. 

• We have not assumed that the reader has any substantial knowledge of 
insurance systems. Insurance terms are defined when they are first used. In 
fact, most of the material could be disassociated from the insurance process 
altogether, and this book could be just another applied statistics text. What 
we have done is kept the examples focused on insurance, presented the mate¬ 
rial in the language and context of insurance, and tried to avoid getting into 
statistical methods that would have little use in actuarial practice. 

In particular, the first edition of this text was published in 1998 to achieve 
three goals: 

1. Update the distribution fitting material from Loss Distributions [59] by 
Robert Hogg and Stuart Klugman, published in 1984. 

2. Update material on discrete distributions and collective risk model cal¬ 
culations from Insurance Risk Models [106] by Harry Panjer and Gordon 
Willmot, published in 1992. 






3. Integrate the material the three authors had developed for the Society 
of Actuaries’ Intensive Seminar 152, Applied Risk Theory. 

Shortly after publication, the Casualty Actuarial Society and the Society of 
Actuaries altered their examination syllabus to include our jSrst edition. The 
good news was that the first edition was selected as source material for the 
new third and fourth examinations. The bad news was that the subject matter 
was split between the examinations in a manner that was not compatible with 
the organization of the text. By itself, that is sufficient reason to produce a 
revision. But there are others. 

1. The first edition was written with an assumption that readers would 
be familiar with the subject of mathermatical statistics. This had been 
part of the actuarial examination process at the time the book was 
written but was subsequently removed. Some background material on 
mathematical statistics is now presented in Chapter 9. 

2. For a long time, actuarial education has included the subject of survival 
models. This is the study of determining probability models for time 
to death, failure, or disability. It is not much different from the study 
of determining probability models for the amount or number of claims. 
This edition integrates that subject and in doing so adds an emphasis 
on building empirical models. This is covered in Chapters 10 and 11. 

3. There were two items that had been removed from the actuarial syl¬ 
labus over the years that we wanted to see returned, at least in brief. 
One is graduation, the smoothing of and interpolation of sequences of 
numbers. This is covered in Chapter 15. The other is the adjustment 
of estimation formulas when dealing with the large amounts of data in 
mortality studies. This is covered in Section 11.4. 

4. While simulation was briefly covered in the first edition, the material 
has been slightly expanded and now appears in Chapter 17. 

With regard to continuing material, besides the rearrangment of the mater¬ 
ial on models and modeling, other substantive changes are a significant rewrite 
of the ruin theory material (Chapters 7 and 8), and a better explanation of 
the limited fluctuation credibility formulas (Chapter 16). 

While we have attempted to integrate the material into a single, logical 
development of actuarial model building, various sections stand alone, thus 
the division of the text into various parts. 

Since the publication of the first edition, computational power continues 
to increase. For that edition, specialized DOS programs were made available 
for obtaining maximum likelihood estimates and performing aggregate loss 
calculations. Those programs continue to be available at the Wiley ftp site: 

ftp://ftp.wiley.com/public/sci_tech_med/loss_models / 


In addition, files containing the data for examples and exercises axe also 
available. However, it is more likely that users will be calculating using a 
spreadsheet program such as Microsoft Excel®. 1 At various places in the text 
we indicate how Excel ⑬ commands may help. This is not an endorsement by 
the authors, but rather a recognition of the pervasiveness of this tool. 

As in the first edition, many of the exercises are taken from examinations 
of the Casualty Actuarial Society and the Society of Actuaries. They have 
been reworded to fit the terminology and notation of this text and the five 
answer choices from the original questions are not provided. Such exercises 
are indicated with an asterisk (*). Of course, these questions may not be 
representative of those asked on examinations given in the future. 

Finally, a word about our cover picture. In the summer of 1993 the Des 
Moines and Raccoon Rivers flooded, putting sections of Des Moines, Iowa 
under water. At the left center of the picture is the Des Moines Water 
Works. Contamination knocked out the water supply for 12 days. While 
living through adverse events can be interesting, it is not necessary to do so 
to build probability models for their occurrence, timing, and severity. Our 
thanks to Melissa Sharer of the Des Moines Water Works for providing the 
picture. 

S. A. IClugman 
EL EL Panjer 
G. E. WlLLMOT 

Des Moines, Iowa 
Waterloo, Ontario 



1 Microsoft® and Excel® are either registered trademarks or trademarks of Microsoft Cor¬ 
poration in the United States and/or other countries. 
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Modeling 


1.1 THE MODEL-BASED APPROACH 

The model-based approach should be considered in the context of the ob¬ 
jectives of any given problem. Many problems in actuarial science involve 
the building of a mathematical model that can be used to forecast or predict 
insurance costs in the future. 

A model is a simplified mathematical description which, is constructed based 
on the knowledge and experience of the actuary combined with data from the 
•past. The data guide the actuary in selecting the form of the model as well 

as in calibrating unknown quantities, usually called parameters. The model 

provides a balance between simplicity and conformity to the available data. 

The simplicity is measured in terms of such things as the number of un¬ 
known parameters (the fewer the simpler); the conformity to data is measured 
in terms of the discrepancy between the data and the model. Model selection 
is based on a balance between the two criteria, namely, fit and simplicity. 

1.1.1 The modeling process 

The modeling process is illustrated in Figure 1.1, which describes six stages. 
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6 MODELING 


This approach was codified by the Society of Actuaries Committee on Ac¬ 
tuarial Principles. In the publication “Principles of Actuarial Science” [123], 
p. 571, Principle 3.1 states that u Actuarial risks can be stochastically modeled 
based on assumptions regarding the probabilities that will apply to the actu¬ 
arial risk variables in the future，including assumptions regarding the future 
environment.” The actuarial risk variables referred to are occurrence, timing, 
and severity~that is, the chances of a claim event, the time at which the 
event occurs if it does, and the cost of settling the claim. 


1.2 ORGANIZATION OF THIS BOOK 


This text takes the reader through the modeling process, but not in the order 
presented in the previous section. There is a difference between how models 
axe best applied and how they are best learned. In this text we first learn 
about the models and howto use them. This is followed by instruction in how 

to determine which model to use. The reason is that it is difficult to select 

models in a vacuum. Unless the analyst has a thorough knowledge of the 
set of available models, it is difficult to narrow the choice to the ones worth 
considering. With that in mind, the organization of the text is as follows: 


1. Review of probability~~Almost by definition, contingent events imply 
probability models. Chapters 2 and 3 review random variables and some 
of the basic calculations that may be done with such models, including 
moments and percentiles. 


ability distributions — In order to select a probability 
large collection of such 
a priori model choice, 
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Chapter 6. Usually, the payments arrive sequentially through time. It 
is possible, if the payments turn out to be large, that at some point 
the entity will run out of money. This state of affairs is called ruin. In 
Chapters 7 and 8 models are established that allow for the calculation 
of the probability this will happen. 

5. Review of mathematical statistics ― Because most of the models being 
considered are probability models, techniques of mathematical statistics 
axe needed to estimate model specifications and make choices. While 
Chapter 9 is not a replacement for a thorough text or course in mathe¬ 
matical statistics, it does contain the essential items needed later in this 
text. 

6. Construction of empirical models — Sometimes it is appropriate to work 
with the empirical distribution of the data. It may be because the vol¬ 
ume of data is sufficient or because a good portrait of the data is needed. 
Chapters 10 and 11 cover this for the simple case of straightforward data, 
adjustments for truncated and censored data, and modifications suitable 

for large data sets, particularly those encountered in mortality studies. 

7. Construction of parametric models——Often it is valuable to smooth the 
data and thus represent the population by a probability distribution. 
Chapter 12 provides methods for parameter estimation for the models 
introduced earlier. Model selection is covered in Chapter 13. 

8. Chapter 14 contains examples that summarize and integrate the topics 
discussed to this point. 


9. Adjustment of estimates — At times, further adjustment 
is needed. Two such adjustments are covered in this tex 











































Part II 


Actuarial models 



2 . 

Random variables 


2.1 INTRODUCTION 


An actuarial model is an attempt to represent an uncertain stream of future 
payments. The uncertainty maybe with, respect to any or all of occurrence (is 
there a payment?), timing (when is the payment made?), and severity (how 
much is paid?). Because the most useful means of representing uncertainty 
is through probability, we concentrate on probability models. In all cases, 
the relevant probability distributions are assumed to be known. Determin¬ 
ing appropriate distributions is covered in Chapters 10-13. In this part, the 
following aspects of actuarial probability models will be covered: 


1. Definition, of random variable，important functions, and some examples. 

2. Basic calculations from probability models. 

3. Specific probability distributions and their properties. 

4. More advanced calculations using severity models. 

5. Models incorporating the possibility of a random number of payments 
each of random amount. 

6. Models that track a company’s surplus through time. 
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RANDOM VARIABLES 


There are two important models that are absent from this text. The first 
a model for investment earnings in future years. While such techniques are 
thin the scope of this text, the additional finance background needed to 
Dtivate these models would detract from the primary purpose of this text. 
ie second is a model that combines the earning of interest jwhether random 
not) with the timing of the payment. While simple models of this type 


particular outcome that occurs will determine the : 
Attaching probabilities to the various outcomes a 
expectations and the risk of not meeting them. In t 


a list of ceils containing policy type, age range, gender, and 


To expand on this concept, consider the following definitions from the latest 
working draft of “Joint Principles of Actuarial Science，， 1 : 


Phenomena are occurrences that can be observed. An experiment is 
an observation of a given phenomenon under specified conditions. The 
result of an experiment is called an outcome; an event is a set of one or 
more possible outcomes. A stochastic phenomenon is a phenomenon for 
which an associated experiment lias more than one possible outcome. 

An event associated with a stochastic phenomenon is said to be contin¬ 
ent. Probability is a measure of the likelihood of the occurrence of an 
event. It is measured on a scale of increasing likelihood from zero to 
one. A random variable is a function that assigns a numerical value to 
every possible outcome. 

ie following list contains a number of random variables encountered in 
rial work: 


The age at death of a randomly selected birth. (Model 1) 

The time to death from when insurance was purchased for a randomly 
selected insured life. 

The time from occurrence of a disabling event to recovery or death for 
a randomly selected workers compensation claimant. 


document is a work in progress of a joint committee from the Casualty Actuarial 
Y and the Society of Actuaries. Key principles are that models exist that represent 
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® ⑷ is nondecreasing. 

• F(x) is right-continuous. 


pca^ise it need not be left-continuous, it is possible for the distribution func- 
tu^to jump. When it jumps, the value is assigned to the top of the jump. 
Here are possible distribution functions for each of the four models. 

^iodel ^ This random variable could serve as a model for the age at death. 
All ages between 0 and 100 axe possible. While experience suggests that there 
13 bound for human models with no upper limit may be 

ireful if they assign extremely low probabilities to extreme ages. This allows 
tJie modeler to avoid setting a specific maximum age. . 

f 0 ， a; < 0, 

巧 ( 忠） = 《 O.Olx, 0 < a; < 100, 

I 1 ， x > 100. □ 

=del 2 This random variable could serve as a model for the number of 
doUarspmdonan automobile insurance claim. All positive values axe possible. 
As mortality, there is more than likely an upper limit (all the money 
m the world comes to mind), but this model illustrates that in modeling 
correspondence to reality need not be perfect 


Example 2 2 Draw graphs of the distribution function for Models 1 and 2 
(graphs for the other models are requested in Exercise 2.2). 

The graphs appear in Figures 2.1 and 2.2. □ 

Model 3 This random variable could serve as a model for the number of 
P^bability is concentrated at the five points 
i° ! ri 3 * 4 ^ the P robabmt y at each is given by the size of the jump in the 
distribution ftmction. While this model places a maximum on the number of 
Claris, models with no limit (such as the Poisson distribution) could also be 
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x 


Fig. 2.1 Distribution function for Model 1. 



x 


Fig. 2.2 Distribution function for Model 2. 


r positive values. 
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F,(x) — / 0 ， x < 0 ， 

i_o.3e-°- 00001 - } X>0. □ 

Definition 2.3 The support of a random variable is the set of numbers that 
are possible values of the random variable. 

Definition 2.4 A random variable is called discrete if the support contains 
at most a countable number of values. It is called continuous if the distrib- 
ution function is continuous and is differentiable everywhere with the possible 
exception of a countable number of values. It is called mixed if it is not dis- 
■ crete and is continuous everywhere with the exception of at least one value 
and at most a countable number of values. 

These three definitions do not exhaust all possible random variables but 
will cover all cases encountered in this text. The distribution function for a 
discrete variable will be constant except for jumps at the values with positive 
probability. A mixed distribution will have at least one jump Requiring 
continuous variables to be differentiable allows the variable to have a density 
function (defined later) at almost all values. 

Example 2.5 For each of the four models, determine the support and indi¬ 
cate which type of random variable it is. 

The distribution function for Model 1 is continuous and is differentiable 
except at 0 and 100 and therefore is a continuous distribution. The support 
is values from 0 to 100 with it not being clear if 0 or 100 are included. The 
distribution function for Model 2 is continuous and is diflFerentiable except 
at 0 and therefore is a continuous distribution. The support is all positive 
numbers and perhaps 0. The random variable for Model 3 places probability 
only at 0 3 1, 2, 3, and 4 (the support) and thus is discrete. The distribution 
function for Model 4 is continuous except at 0, where it jumps. It is a mixed 
distribution with support on nonnegative numbers. 口 

These four models illustrate the most commonly encountered forms of the 
distribution function. For the remainder of this text, values of functions like 
the distribution function will be presented only for values in the range of the 
support of the random variable. 

Definition 2.6 The survival function, usually denoted S x (x) or S(x) t for 
a 观 dom variable X is the : probability that X is greater than a given number. 
That is, Sx{x) = Pr(X > x) = 1 - Fx{x). 

As a result: 

® 0 $ S(x) < 1 for all x. 


KEY FUNCTIONS AND FOUR MODELS 17 

• 5(a;) is nonincreasing. 

• S(x) is right-continuous. 

• lima；— 一 oo S(x) = 1 and lim x ^oo S(x) = 0. 

Because the survival function need not be left-continuous, it is possible for it 
to jump (down). When it jumps, the value is assigned to the bottom of the 
jump. 

Because the survival function is the complement of the, distribution func¬ 
tion, knowledge of one implies knowledge of the other. Historically, when the 
random variable is measuring time, the survival function is presented, while 
when it is measuring dollars, the distribution function is presented. 

Example 2.7 For completeness^ here are the survival functions for the four 













Survival function for Model 2. 


That is, f(x) = F'(x) = -S'{x). The density function is defined only at those 
P° mts —ere the derivative exists. The abbreviation pdf is often used. 


While the density function does not directly provide probabilities, it does 
provide relevant information. Values of the random variable in regions with 
higher density values are more likely to occur than those in regions with lower 
values. Probabilities for intervals and the distribution and survival functions 
can be recovered by integration. That is, when the density function is defined 
over the relevant interval, Pr(a < X < b) = /^/⑷血，斤⑻ =f 6 f(x)dx, 
and S(b) = / fc °° f(x)dx. 


0.014 
0.012 
0.01 
» 0.008 
. 0.006 
0.004 
0.002 


20 40 


Fig. 2.5 Density function for Model 1. 
Example 2.10 For our models. 


f x (x) = 0.01, 0<a; <100, 

3(2,000) r 


/2(工） =' 


a; > 0, 


(x + 2 5 000) 4 
fz{x) is not defined, 
f 4 (x) = 0.000003e 一 0 刀 0001 ' x > 0. 

It should be noted that for Model 4 the density function does not completely 
describe the probability distribution. As a mixed distribution, there is also 
discrete probability at 0. □ 

Example 2.11 Graph the density function for Models 1 and 2. 

The graphs appear in Figures 2.5 and 2.6. □ 


Definition 2.12 The probability function, also called the probability mass 
function, usually denoted px(x) or p(x) } describes the probability at a dis¬ 
tinct point when it is not 0. The formal definition is px{x) = Pr(X = x). 








which there is a 
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Basic distributional 
quantities 


3.1 MOMENTS 


There axe a variety of interesting calculations that can be done from the 
models described in the previous chapter. Examples axe the average amount 

paid on a claim that is subject to a deductible or policy limit or the average 

remaining lifetime of a person age 40. 

Definition 3.1 The kth raw moment of a random variable is the expected 
(average) value of the kth power of the variable, provided it exists. It is denoted 
by E(X fc ) or by fi f k . The first raw moment is called the mean of the random 
variable and is usually denoted by fi. 


Note that /x is not related to fi(x), the force of mortality as mentioned on 
page 21. For random variables that take on only nonnegative values [that 
is, Pt(X > 0) = 1], k may be any real number. When presenting formulas 
for calculating this quantity, a distinction between continuous and discrete 
variables needs to be made. Formulas will be presented for random variables 
that are either everywhere continuous or everywhere discrete. For mixed 
models, evaluate the formula by integrating with respect to its density function 
wherever the random variable is continuous and by summing with respect to 
its probability function wherever the random variable is discrete and adding 
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the results. The formula for the kth. raw moment is 

roo 

/4 = E(X k ) = x k f(x)dx if the random variable is continuous 

=^2 if the random variable is lliscrete, (3.1) 

j 

where the sum is to be taken over all Xj with positive probability. Finally, 
note that it is possible that the integral or sum will not converge, in which 
case the moment is said not to exist. 

Example 3.2 Determine the first two raw moments for each of the five mod¬ 
els. 

The subscripts on the random variable X indicate which model is being 
used. 


a;(0.01)da; = 50, 


E (功 = 


x 2 {Q.Ql)dx = 3,333.33, 


E(X!)= 

寧 3 )= 
E(X|)= 


Jo •> + 2,000f _ 丄， 

0(0.5) +1(0.25) + 2(0.12) + 3(0.08) + 4(0.05) = 0.93, 
0(0.5) +1(0.25) + 4(0.12) + 9(0.08) +16(0.05) = 2.25, 

0(0.7) + [ x(0.000003)e-°* oooolx dx = 30,000, 


E(X|) = 0 2 (0.7) + / x 2 (0.000003)e-° oooola; da? = 6,000,000,000, 


x(0.01)dx+ / x(0,02)dx = 43.75, 


rc 2 (0.01)dx+ / x 2 (0.02)dx = 2,395.83. 


Before proceeding further, an additional model will be introduced. This 
one looks similar to Model 3, but with one key difference. It is discrete, 
but with tlie added requirement that all of the probabilities must be integral 
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Definition 3.3 The empirical model is a discrete distribution based on a 
sample of size n which assigns probability 1/n to each data •point. 

Model 6 Consider a sample of size 8 in which the observed data points were 
3, 5, 6, 6, 6, 7, 7, and 10. The empirical model then has probability function 

1 0.125, x = 3, 

0.125, ar = 5, 

0.375, x = 6 ， 

0.25, x = 7, * 

0.125, a; = 10. □ 


Alert readers will note that many discrete models with, finite support look 
like empirical models. Model 3 could have been the empirical model for a 
sample of size 100 that contained 50 zeros, 25 ones, 12 twos, 8 threes, and 5 
fours. Regardless, we will use the term empirical model only when there is an 
actual sample behind it. The two moments for Model 6 are 

E(Xq) = 6.25, E(X|) = 42.5 

using the same approach as in Model 3. It should be noted that the mean 
of this random variable is equal to the sample axitlimetic average (also called 
the sample mean). 

Definition 3.4 The kth central moment of a random variable is the ex¬ 
pected value of the kth power of the deviation of the variable from its mean. 
It is denoted by E[(X — fi) k ] or by ii k . The second central moment is usually 
called the variance and denoted a 2 and its square root, a, is called the stan¬ 
dard deviation. The ratio of the standard deviation to the mean is called 
the coefficient of variation. The ratio of the third central moment to the 
cube of the standard deviation ， 7i = is called the skewness. The ratio 

of the fourth central moment to the fourth power of the standard deviation ， 
7 2 = is called the kurtosis. 1 

The continuous and discrete formulas for calculating central moments are 



E[(X- M ) fe ] 

roo 

/ (x — fi) k f(x)dx if the random variable is continuous 
J —oo 

一 fi) k p(xj) if the random variable is discrete. (3.2) 


l It would be more accurate to call these items the “coefficient of skewness” and ^coefficient 
of kurtosis 55 because there are other quantities that also measure asymmetry and flatness. 
The simpler expressions will be used in this text. 





measures 
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， the integral need be taken only over those x values where f(x) is 
The standard deviation is a measure of how much the probability 

out over the random variable’s possible values. It is measured in 

units as the random variable itself. The coefficient of variation 
the spread relative to the mean. The skewness is a measure of 
asyinmetry. A symmetric distribution has a skewness of zero, while a positive 
skewness indicates that probabilities to the right tend to be assigned to values 
further from the mean than those to the left. The kurtosis measures flatness 
of the distribution relative to a normal distribution (which has a kurtosis of 
3). Kurtosis values above 3 indicate that (keeping the standard deviation 
constant), relative to a normal distribution, more probability tends to be at 
points away from the mean than at points near the mean. The coefficients of 
variation, skewness, and kurtosis are all dimensionless. 

There is a link between raw and central moments. The following equation 
indicates the connection between second moments. The development uses the 
continuous version from (3.1) and (3.2), but the result applies to all random 
variables. 

fJ-2 = [ {x-fj,) 2 f(x)dx= f (x 2 - 2xii + Ii 2 )f(x)dx 

- J —OO J —oo 

= E(x 2 ) - 2 M E(X) +m 2 = M 7 2 - ju 2 . (3.3) 

Example 3.5 The density function of the gamma distribution appears to be 
positively skewed. Demonstrate that this is true and illustrate with graphs. 

From Appendix A, the first three raw moments of the gamma distribution 
are a0 ， a(a + l)0 2 , and a(a + l)(a + 2)0 3 . From (3.3) the variance is a0 2 and 
from the solution to Exercise 3.1 the third central moment is 2a6 s . Therefore, 
the skewness is 2a：- 1 / 2 . Because a must be positive, the skewness is always 
positive. Also, as a decreases, the skewness increases. 

Consider the following two gamma distributions. One has parameters a = 
0.5 and 0 = 100 while the other has o: = 5 and 6 = 10. These have the same 
mean, but their skewness coefficients are 2.83 and 0.89, respectively. Figure 

3.1 demonstrates the difference. □ ’ 

Note that when calculating the standard deviation for Model 6 in Exercise 

3.2 the result is the sample standard deviation using n (as opposed to the 
more commonly used n — 1) in the denominator. Finally, it should be noted 
that when calculating moments it is possible that the integral or sum will not 
exist (as is the case for tlie third and fourth moments for Model 2). For the 
models we typically encounter, the integrand and summand are nonnegative 
and so failure to exist implies that the required limit that gives the integral 
or sum is infinity. See Example 4.15 for an illustration. 




Fig. 3.1 Densities of f(x ) 〜 gannna(0.5,100) and g{x) ^ gaxnma(5,10). 

Definition 3.6 For a given value of d with Pr(X > d) > 0, the excess loss 
variable is Y = X - d given that X > d. Its expected value ， 

ex{d) = e(d) = E(y)= E(X-d\X>d), 

is called the mean excess loss function. Other names for this expectation 
are mean residual life function and complete expectation of life. When 
the latter terminology is used，the commonly used symbol is id- 

This variable could also be called a left truncated and shifted variable. 

It is left truncated because observations below d are discarded. It is shifted 

because d is subtracted from the remaining values. When X is a payment 
variable, the mean excess loss is the expected amount paid given that there 
has been a payment in excess of a deductible of d. When X is the age at 
death, the mean excess loss is the expected remaining time until death given 
that the person is alive at age d. The kth moment of the excess loss variable 
is determined from 

ex(d) = id - dyf(x)dx ^ ^ var i a ble is continuous 
1 - F(d) 

= J fj |£ the variable is discrete. (3.4) 

1 — F{a) 

Here, e^(d) is defined only if the integral or sum converges. There is a paxtic- 
ulaxly convenient formula for calculating the first moment. The development 
is given below for the continuous version, but the result holds for all ran- 


variables. The 
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antiderivative of f(x) is taken as —S(x). 

J^(x-d)f{x)dx 
X{) 1-F(d) 

= -(x-d)S(a;)|~ + / d °° S{x)dx 

~ W) 

= ir S(x)dx 
~ S(d ) - 

Definition 3.7 The left censored and shifted variable is 


y = (x-d) + = 


0 ， X<d, 
X-d, X>d. 


It is left censored because values below d are not ignored but are set equal 
to 0. There is no standard name or symbol for the moments of this variable. 
For dollar events, the distinction between the excess loss variable and the left 
censored and shifted variable is one of per payment versus per loss. In the 
former situation, the variable exists only when a payment is made. The latter 
variable takes on the value 0 whenever a loss produces no payment. The 
moments can be calculated from 

E[(X — d)^] = J (x — d) k f(x)dx if the variable is continuous, 

=~ ^) k p( x j) ^ the variable is discrete. (3.6) 

Xj>d 

It should be noted that 

E[(X-d)^.] = e fc (d)[l-F(d)]. (3.7) 

Example 3.8 Construct graphs to illustrate the difference between the excess 
loss variable and the left censored and shifted variable. 

The two graphs in Figures 3.2 and 3.3 plot the modified variable Y as 
a function of the unmodified variable X. The only difference is that for X 
values below 100 the variable is undefined while for the left censored and 
shifted variable it is set equal to zero. □ 


The next definition provides a complementary function to the excess loss. 


Definition 3.9 The limited loss variable 


X, X <u, 
u, X >u. 


Its expected value, E[X Au] } is called the limited expected value. 



Fig. 3.3 Left censored and shifted variable. 


This variable could also be called the right censored variable. It is right 
censored because values above u are set equal to u. An insurance phenom¬ 
enon that relates to this variable is the existence of 汪 policy limit that sets 
a maximum on the benefit to be paid. Note that (X — d)+ + (X Ad) = X. 
That is, buying one policy witli a limit of d and another with a deductible of 
d is equivalent to buying fall coverage. This is illustrated in Figure 3.4. 
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Loss 


Fig. 3.4 Limit of 100 plus deductible of 100 equals full coverage. 


The most direct formulas for the fcth moment of the limited loss variable 


E[(X A u) k ] = J x k f(x)dx + 沪 [1 - F(u)] 

if the random variable is continuous 
= x^pix^+u^l- F(u)] 

if the random variable is discrete. 
Another interesting formula is derived as follows: 


(3.8) 


E[(X A u) fe ] 


x k f(x)da 


x k f(x)dx + u k [l-F(u)] 

J —co JO 

= x k F(x) ( L OQ — f kx k ~ l F(x)dx 

J —oo 

-x k S(x)^+ r kx k ~ 1 S(x)dx + u k S{u) 

Jo 

=—/ kx k ^ 1 F(x)dx ( kx k ^ 1 S(x)dx, 

J—oo Jo 

where the second line uses integration by parts. For fe = 1， we have 
E(XAu) = - 「 F(x)dx+ f U S(x)dx. 


(3-9) 


saving per incident when a deductible is imposed. The fcth limited moment 
of many common continuous distributions is presented in Appendix A. Ex¬ 
ercise 3.8 asks you to develop a relationship between the three first moments 
introduced previously. 

3.1.1 Exercises 

3.1 Develop formulas similar to (3.3) for /x 3 and ji A . 

3.2 Calculate the standard deviation, skewness, and kurtosis for eaicli of the 
six models. It may help to note that Model 2 is a Pareto distribution and the 
density function in the continuous part of Model 4 is an exponential distrib¬ 
ution. Formulas that may help with calculations for these models appear in 
Appendix A. 

3.3 (*) A random variable has a mean and a coefficient of variation of 2. The 
third raw moment is 136. Determine the skewness. 

3.4 (*) Determine the skewness of a gamma distribution that has a coefficient 
of variation of 1. 

3.5 Determine the mean excess loss function for Models 1-4. Compare the 
functions for Models 1, 2, and 4. 

3.6 (*) For two random variables, X and Y, ey(30) = e_x(30) + 4. Let X 
have a uniform distribution on the interval from 0 to 100 and let Y have a 
uniform distribution on the interval from 0 to w. Determine w. 

3.7 (*) A random variable has density function f(x) = 入 - 1 芒 -®/ 入， a;, A > 0. 
Determine e(A)，the mean residual life function evaluated at A. 

3.8 Show that the following relationship holds: 

E ⑷ =e ⑷ 5 ⑷ + E(X Ad). (3.10) 

3.9 Determine the limited expected value function for Models 1-4. Do this 
using both (3.8) and (3.10). For Models 1 and 2 also obtain the function using 

(3-9). 

3.10 (*) Which of the following statements are true? 

(a) The mean residual life function for an empirical distribution is con¬ 
tinuous. 

(b) The mean residual life function for an exponential distribution is 
constant. 


The corresponding formula for discrete random variables is not particularly 
interesting. The limited expected value also represents the expected dollar 




If it exists, the mean residual life function for a 
is decreasing. 


Pareto distribution 
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3.11 (*) Losses have a Pareto distribution with a = 0.5 and 6 — 10,000. 
Determine the mean residual life at 10,000. 

3.12 Define a right truncated variable and provide a formula for its feth mo¬ 
ment. 

3.13 (*) The severity distribution of individual claims has pdf 

f{x) = 2.5a：- 3 - 5 , x> 1. 

Determine the coefficient of variation. 

3.14 (*) Claim sizes are for 100, 200, 300, 400, or 500. The true probabilities 
for these values are 0.05, 0.20, 0.50, 0.20, and 0.05, respectively. Determine 
the skewness and kurtosis for this distribution. 

3.15 (*) Losses follow a Pareto distribution, with a > 1 and 6 unspecified. 
Determine the ratio of the mean excess loss function at x = 20 to the mean 
excess loss function at a: = 0. 


3.2 PERCENTILES 

One other value of interest that maybe derived from the distribution fonction 
is the percentile function. It is the inverse of the distribution function, but 
because this quantity is not well defined, an arbitrary definition must be 
created. 

Definition 3.10 The lOOpth percentile of a random variable is any value 
tt p such that F(7T P —) < p < F(7t p ). The 50th percentile ， 7To.5 is called the 
median. 

If the distribution function has a value of p for one and only one x value, 
then the percentile is uniquely defined. In addition, if the distribution function 
jumps from a value below p to a value above p, then the percentile is at the 
location of the jump. The only time the percentile is not uniquely defined 
is when the distribution function is constant at a value of p over a range of 
values. In that case, any value in that range can be used as the percentile. 

Example 3.11 Determine the 50th and 80th percentiles for Models 1 and 3. 

For Model 1， the pth percentile can be obtained from p = jP(tt p ) = 0.01 tt p 
and so tt p = lOOp, and in particular, the requested percentiles are 50 and 
80 (see Figure 3.5). For Model 3 the distribution function equals 0.5 for all 
0 < a; < 1 and so all such values can be the 50th percentile. For the 80th 
percentile, note that at 怎 = 2 the distribution function jumps from 0.75 to 
0.87 and so ttq.s = 2 (see Figure 3.6). □ 
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x 


Fig. 3.5 Percentiles for Model 1. 



x 


Fig. 3.6 Percentiles for Model 2. 


3.16 (*) The cdf of a random variable is F{x) = 1 — a; 一 2 , a; > 1. Determine 
the mean, median, and mode of this random variable. 

3.17 Determine the 50th. and 80th percentiles for Models 2, 4, 5, and 6. 

3.18 (*) Losses have a Pareto distribution with parameters a and 9. The 
10th percentile is 0 — k. The 90th percentile is 50 — 3fc. Determine the value 
of a. 

3.19 (*) Losses have a Weibull distribution with parameters r and 9. The 
25th percentile is 1,000 and the 75th percentile is 100,000. Determine the 
value of r. 
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3.3 GENERATING FUNCTIONS AND SUMS OF RANDOM 
VARIABLES 

An insurance company rarely insures only one person. The total claims paid 
on all policies is the sum of all payments. Thus it is useful to be able to 

determine properties of Sk = X\-\ - h Xk- The first result is a version of 

the central limit theorem. 

Theorem 3.12 For a random variable Sk cls defined above ， E(Sk) = E(Xi)-f 
• • • +E(Xfc). Also, if Xi,...,X k are independent, Var(Sfc) =Vax(Xi) + • • • 
•fVax(X fc ). If the random variables Xi,X 2) ... are independent and their first 
two moments meet certain conditions, ]hnk-*oo[Sk—^{Sk)]/ \/Vax(iS , jb) has a 
normal distribution with mean 0 and variance 1. 

Obtaining the distribution or density function of Sk is usually very difficult. 
However, there are a few cases where it is simple. The key to this simplicity 
is the generating function. 

Definition 3.13 For a random variable X ， the moment generating func¬ 
tion (mgf) is Mx(t) = E(e tx ) for all t for which the expected value exists. 
The probability generating function (pgf) is Px(^) = E(^ x ) for all z for 
which the expectation exists. 

Note that M x (t) = Px{e l ) and Px(z) = M x Qnz), Often the mgf is used 


variable’s distribution function and its 
with different distribution functions c 
following result aids in working with s 



Theorem 3.14 Let + • • • + 

sum are independent Then Ms k (t )= : 
provided all the component mgfs and j 


andP Sk (z) = n》=i % ( z ) 

exist. 
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The moment generating function of a gamma variable is 

E( 〜 

f n °° x^e-^+^dx 

= r(^ 

.^y^i-t + m^e-ydy 

= 

= r(a)( r(lr /0) " ： = (l^) lt<1/0 -" 

Now let Xj have a gamma distribution with parameters aj and 0. Then the 
moment generating function of the sum is 


wMch is the moment generating function of a gamma distribution with para¬ 
meters a，i - {-ajb and 6. 口 


Example 3.16 Obtain the mgf and pgf for the Poisson distribution. 


Px( Z ) = t ，字 = = ^ = 


Then the mgf is Mx(t) = Px(^) = exp[A(e* — 1)]. 


3.3.1 Exercises 

3.20 (*) A portfolio contains 16 independent risks^each with a gamma dis- 


the incomplete gamma function for the 
exceeds 6,000. Then approximate this ! 




3.21 (*) The severities of individual cls 

parameters a = 8/3， and 6 = 8,000. ———— - 

proximate the probability that the sum of 100 independent claims will exceed 
600,000. 


3.22 (*) The severities of individual claims have the gamma distribution (see 
Appendix A) with parameters a = 5 and 6 = 1,000. Use the central limit 
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〕 approximate the probability that the sum of 100 independent 
seds 525,000. 

mple of 1,000 health insurance contracts on adults produced a sam- 
)f 1,300 for the annual benefits paid with a standard deviation of 
expected that 2,500 contracts will be issued next year. Use the 
dt theorem to estimate the probability that benefit payments will 
lan 101% of the expected amount. 

v that the mgf for the inverse Gaussian distribution with 0 replaced 
/6 is given by 

M{t) = exp ^ (l - Vl-2/3t) , t< 1/(2 外 


Classifying and creating 
distributions 


4.1 INTRODUCTION 

The set of all possible distribution functions is too large to comprehend. 
Therefore, when seaxching for a distribution function to use as a model for 
a random phenomenon, it can be helpful if the field can be narrowed. One 
division that has already been discussed is the separation into discrete, contin¬ 
uous, and mixed distributions. In most situations it will be obvious which of 
the three applies. Beyond this, we need more artificial distinctions. The next 
section describes a split based on the complexity of the model. The following 
section then looks at the shape of the distribution to distinguish one from 
another. After that, a few methods of creating additional distributions are 
introduced. This is followed by a listing of some commonly used, continuous 
distributions. The final section is an extensive treatment of discrete distrib¬ 
utions. By the end of this chapter, most of the distributions in Appendices 
A and B will have been introduced. While this chapter is more about differ¬ 
ences from one distribution to another, these distributions have some common 
elements that axe desirable for actuarial models. Among them are: 

• The support is a subset of the nonnegative real numbers. Most actuarial 
phenomena axe counts or are measurements of time or money and as 
such are rarely negative, although if the random variable of interest 

Loss Models: From Data to Decisions, Second Edition. 

By Stuart A. Klugman, Harry H. Panjer, and Gordon E. Willmot 
ISBN 0-471-21577-5 Copyright © 2004 John Wiley & Sons, Inc. 
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is financial gain, negative outcomes are certainly possible. However, 
financial gain is often the result of the realization of several nonnegative 
variables, some measuring income and some measuring expenses. 

• Some distributions are special cases of others. This, allows the modeler 
to choose from models of varying complexity. 

• They can have one or more modes and a mode may or may not be at 
zero. 


4.2 THE ROLE OF PARAMETERS 

This split has to do with, how much information is needed to specify the model. 
The number of quantities (parameters) needed to do so gives some indication 
of how complex a model is, in the sense that many items are needed to describe 
it. Arguments for a simple model include the following: 

• With few items required in its specification, it is more likely that each 
item can be determined more accurately. 


• It is more likely to be stable across time and across settings. That is, 
if the model does well today, it (perhaps with small changes to reflect 
inflation or similar phenomena) will probably do well tomorrow and will 
also do well in other, similar, situations. 

® Because data can often be irregular, a simple model may provide neces- 
saxy smoothing. 

Of course, complex models also have advantages. 

• With many items required in its specification, it can more closely match 
reality. 

• With many items required in its specification, it can more closely match 
irregularities in the data. 


Another way to express the difference is that simpler models can be es¬ 
timated more accurately, but the model itself may be too superficial. The 
principle of parsimony states that the simplest model that adequately reflects 
reality should be used. The definition of “adeauatelv” will denftnd on 


These 


Definition 4.1 A parametric distribution is a set of distribution func¬ 
tions, each member of which is determined by specifying one or more values 
called parameters. The number of parameters is fixed and finite. 

The most familiar parametric distribution is the normal distribution with 
parameters \x and a 2 . When values for these two parameters are specified, 
the distribution function is completely known. ' 

These are the simplest distributions in this subsection, because typically 
only a small number of values need to be specified. All of the individual 
distributions in Appendices A and B are parametric. Within this class, distri¬ 
butions with fewer parameters axe simpler than those with more parameters. 

For much of actuarial modeling work, it is especially convenient if the name 
of the distribution is unchanged when the random variable is multiplied by a 
constant. The most common uses for this phenomenon are to model the effect 
of inflation and to accommodate a change in the monetary unit. 

Definition 4.2 A parametric distribution is a scale distribution if ， when 
a random variable from that set of distributions is multiplied by a positive 
constant, the resulting random variable is also in that set of distributions. 

Example 4.3 Demonstrate that the exponential distribution is a scale distri¬ 
bution. 

According to Appendix A, the distribution function is Fx(x) = 1 一 eT x ’ Q • 
Let Y = cX, where c > 0. Then, 

Fy(y) = Pr(Y < y), 

= Pr(cX < y), 
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Pareto distributions. They also found that five parameters were not necessary. 
The distribution they selected has cdf 

㈣ 暴) 、-崎 r ' 

Note that the shape parameters in the two Pareto distributions differ by 2. 
The second distribution places more probability on smaller values. This might 
be a model for frequent, small claims while the first distribution covers large, 
but infrequent claims. This distribution has only four parameters, bringing 
some parsimony to the modeling process. □ 

Suppose we do not know how many distributions should be in the mix¬ 
ture. Then the value of k becomes a parameter, as indicated in the following 
definition. 

Definition 4.10 A variable-component mixture distribution has a dis¬ 
tribution function that can be written as 

K K 

F(^x) = > : OsjFj ( 2 ?) j 》 : dj = 1 } aj > 0 , j = 1, — , -ST, K = 1 , 2 , — • 

3=1 

These models have been called semiparametric because in complexity they 
axe between parametric models and nonparametric models (see the next sub- 
section). This distinction becomes more important when model selection is 
discussed in Chapter 13. When the number of parameters is to be estimated 
from data，hypothesis tests to determine the appropriate number of para¬ 
meters become more difficult. When all of the components have the same 
parametric distribution (but different parameters), the resulting distribution 
is called a “variable mixture of 沒 s” distribution, where g stands for the name 
of the component distribution. 

Example 4.11 Determine the distribution ， density，and hazard rate func¬ 
tions for the variable mixture of exponentials distribution. 

A combination of exponential distribution functions can be written 

F{x) = 1 - ai e~ x/ei - a 2 e- x ^ - a K e~ x / 0K , 

I< 

》: a j = 1, dj, 9j > 0, j = 1, ..., K, if = 1, 2, … • 
and then the other functions are 

a^e-^ + a 2 92 l e~ x ^ + ••• + a K e 立 e - x , eK ， 

+ a 2 02 1 e- x/62 + ... + a K e^e 一 x f eK 

aie~ x / ei + a2e _a; / 02 H - h aKe~ x / ei < . 


/⑷ = 
h(x )= 


is at least as complex as 
iber of cc parameters 37 in- 
towledge increases. 
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many (or more) “parameters” than 


6 on page 27 is a data-dependent distribution. Each data point contributes 
probabiUty 1 /n to the probability function, so the n parameters are the n 
observations in the data set that produced the empirical distribution. 

Another example of a data-dependent model is the kernel smoothuig model 
which is covered in more detail in Section 11.3. Rather than place a spike of 
probabiUty 1/n at each data point, a continuous density function with area 
?/n replaces the data point. Tliis piece is centered at the data point so that 
this model follows the data, but not perfectly. It provides some smoothing 
versus the empirical distribution. A simple example is given below. 

Example 4.14 Construct a kernel smoothing model from Model 6 using the 
uniform kernel and cl bandwidth of 2 . 

The probability density function is 
5 

f{x) = 

j*=i 

— JO, \x-xj\>2, 

巧 ⑷ =I o.25, \x-Xj\ <2, 

where the sum is taken over the five points where the original model has 
positive probability. For example, the first term of the sum is the function 




can also be written as mixture distributions. The reason these models axe 
classified separately is that the number of components relates to the sample 
size rather than to the phenomenon and its random variable. 

4.2.5 Exercises 

4.1 Demonstrate that the lognormal distribution as parameterized in Appen- 




could 


bution families? Which could be considered to be from variable-component 
mixture distributions? 

4.7 (*) Seventy-five percent of claims have a normal distribution with a mean 
of 3,000 and a variance of 1,000,000. The remaining 25% have a normal 
distribution with a mean of 4,000 and a variance of 1,000,000. Determine the 
probability that a randomly selected claim exceeds 5,000. 

4.8 (*) Let X has a Burr distribution with parameters a = 1, 7 = 2, and 
0 = >/l，000 and let Y has a Pareto distribution with parameters a = 1 
and 6 = 1,000. Let Z be a, mixture of X and Y with equal weight on each 
component. Determine the median of Z. Let W = 1.1Z. Demonstrate that 
W is also a mixture of a Burr and a Pareto distribution and determine the 
parameters of W. 

4.9 (*) Consider three random variables : 叉 is a mixture of a uniform distri- 


uniform distribution o 
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ndom variables，one is 


1. Match these random variables with the following descriptions: 

(a) Both the distribution and density functions axe continuous. 

(b) The distribution function is continuous but the density function is 
discontinuous. 

(c) The distribution function is discontinuous. 

•10 Demonstrate that the model in Example 4.14 is a mixture of uniform 
Istributions. 

11 Show that the inverse Gaussian distribution as parameterized in Ap- 
endbc A is a scale family but does not have a scale parameter. 

.12 Show that the WeibuU distribution has a scale parameter. 


.3 TAIL WEIGHT 

'he tail of a distribution (more properly, the right tail) is that part that re- 
eals probabilities about laxge values. It is of interest to actuates because itj 
te occurrence (or lack thereof) of large values that is most mfluentxal on prof- 
;s Risky types of insurance such as medical malpractice feature more large 
laims (relative to the mean) than less risky insurances such ^ automobile 
..,, - j_ — 加 fiiof fonH to hieher probabilities to 


Example 4.15 Demonstrate that for the gamma distribution all positive raw 
moments exist but for the Pareto distribution they do not 


For the gamma distribution 
n °° ux^e ^/ 0 


/4 


) iW 

= 乂 。 W 

e k 

W 

while for the Pareto distribution 


fyQ\ot—l^—y « * 

― f a — 6dy, making the substitution y = x/6 

r(o：)a 


r(a + fc) < oo for all A; > 0. 


a = i? k ^ 

- c 

= a9 a j ^2 ^ for integer values of k. 


(y — 0) k making the substitution y = x-\-9 


The integral exists only if all of the exponents on y m the sum are less than 
—1. That is, j — a — 1 < —1 for all or, equivalently, fc < a. Therefore, only 
some moments exist. □ 





















「= 3 and 0 = 10 and a gamma distribution with parameters 
15. Both distributions have a mean of 5 and a variance of 7! 
consistent with the algebraic derivation. 


4.3,3 Hazard rate and mean residual life patterns 

The nature of the hazard rate function also reveals information about 
of the distribution. If the hazard rate function is decreasing, then i 
values the chance of that value becomes small and the chance of large 


becomes greater. Thus the 
if the hazard rate function 


0.00045 


TAIL WEIGHT 51 


















52 CLASSIFYING AND CREATING DISTRIBUTIONS 
That is, h(x) — 1/0 a.s x—^ oo. 


The mean residual life also gives information about tail weight. If the 
mean residual life function is increasing in d, then at large values the expected 
outcome is much larger and thus probability is moved to the right, indicating 
a heavier tail than a model where the mean residual life function is decreasing 
or is increasing at a slower rate. In fact, the mean, residual life function and 
the hazard rate are closely related in several ways. First, note that 



L 九 」 

Therefore, if the hazard rate is decreasing, then for fixed y it follows that 
t)dt is a decreasing function of d, and from the above 5(2/ + d)/S{d) 
is an increasing fonction of d. But from (3.5), the mean residual life function 
may be expressed as 


e(^)= 


L°° s ( x ) dx ■ 

S{d) 




Thus, if the hazard rate is a decreasing function, then the mean residual 
life function e(rf) is an increasing function of d, because the same is true of 
S{y+d) /5(d) for fixed y. Similarly, if the hazard rate is an increasing function, 
then the mean residual life function is a decreasing function. It is worth noting 
(and is perhaps counterintuitive), however, that the converse implication is not 
true. Exercise 4.16 gives an example of a distribution which has a decreasing 
mean residual life function, but the hazard rate is not increasing for all values. 
Nevertheless, the implications described above are generally consistent with, 
the above discussions of heaviness of the tail. 

There is a second relationship between the mean residual life function and 
the hazard rate. As d —» oo, S(d) and S(x)dx go to 0. Thus, the limiting 
behavior of the mean residual life function as d—* oo may be ascertained via 
L’H6pital’s rule because (3.5) holds. We have 


f.°° S(x)dx v -S[d) 

& e(d) = S(d)— 士羽 




as long as the indicated limits exist. These limiting relationships axe useful if 
S(x) [and hence also h(x) and e{d)\ is complicated. 


Example 4.18 Examine the behavior of the mean residual life function of 
the gamma distribution. 
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S(t)dt = e(0)S e (x), and from (3.5), /J° S(t)dt = e(x)S{x). Equating 
these two expressions results in 

e(x) ^ S e (x) 

W)' 

If the mean residual life function is increasing (implied if the hazard rate is 
decreasing), then e(x) > e(0), which, is obviously equivalent to S e (x) > S(x) 
from the above equality. This in turn implies that 

J S e {x)dx > J S(x)dx. 

But E(X) = J 0 °° S(x)dx from Definition 3.6 and (3.5) if 5(0) = 1. Also, 
/ 0 °° s e (x)dx = / 0 °° xf e (x)dx since both sides represent the mean of the equi¬ 
librium distribution. This may be evaluated using (3.9) with w = oo, & = 2, 
and F(0) = 0 to give the equilibrium mean, that is, 

j" S e (x)dx = J o °° xf e (x)dx = ^J Q xS ( x ) dx = 叢 

The inequality may thus be expressed as 

>E(X), 

or using Vax(X) = E(X 2 ) - {E(X)} 2 as Var(X) > (E(X)} 2 . That is, the 
squared coefficient of variation, and hence the coefficient of variation itself, is 
at least 1 if e(x) > e(0). Reversing the inequalities implies that the coefficient 
of variation is at most 1 if e(x) < e(0), in turn implied if the mean residual 
life function is decreasing or the hazard rate is increasing. These values of the 
coefficient of variation are consistent with the comments made here about the 
heaviness of the tail. 

4.3.4 Exercises 

4.13 Using the methods in this section (except for the mean residual life), 
compare the tail weight of the Weibull and inverse Weibull distributions. 

4.14 Arguments as in Example 4.16 place the lognormal distribution be¬ 
tween the gamma and Pareto distributions with regard to tail weight. To 
reinforce this conclusion, consider a gamma distribution with parameters 
a == 0.2, 9 = 500; a lognormal distribution with parameters /x = 3.709290, 
a = 1.338566; and a Pareto distribution with parameters a = 2.5, 6 = 150. 
First demonstrate that all three distributions have the same mean and vari¬ 
ance. Then numerically demonstrate that there is a value such that the gamma 
pdf is smaller than the lognormal and Pareto pdfs for all arguments above that 
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value and that there is another value such that the lognormal pdf is smaller 
than the Pareto pdf for all arguments above that value. 

4.15 Let y be a random variable that has the equilibrium density from (4.2) ‘ 
That is, fy{y) = fe(y) = 5x(y)/E(X) for some random variable X. Use 
integration by parts to show that 


whenever Mx{t) exists. 

4.16 You are given that the random variable X has probability density func¬ 
tion f(x) = (1 + 2x 2 )e~ 2x , x>0. 

(a) Determine the survival function S(x). 

(b) Determine the hazard rate h(x). 

(c) Determine the survival function S e {x) of the equilibrium distribu¬ 
tion. 

(d) Determine the mean residual life function e(x). 

(e) Determine lim x — 的 h(x) and lim x _oo e(x), 

(f) Prove that e(x) is strictly decreasing but h[x) is not strictly in¬ 
creasing. 

4.17 Assume that X has probability density function f{x),x > 0. 


(a) Prove that 


f^jy - x )f(y) d y 


(b) Use (a) to show that 


yf{y)dy = xS(x) + E{X)S e (x). 


(c) Prove that (b) may be rewritten as 


XT yf{v)dy 


and that this in turn implies that 
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⑷ Use (c) to prove that, if e(x) > e(0)，then 

and that this in turn implies that the mean is at least as large as 
the (smallest) median. 

⑷ Prove that (b) may be rewritten as 


Se(x)= 


e(x) / x °° yf(y)dy 
x + e{x) ' E(X) 


and thus that 


Se(x) < 


e(x) 

x + e(x) 


4,4 CREATING NEW DISTRIBUTIONS 
4.4.1 Introduction 

This section indicates how new parametric distributions can be created from 




Theorem 4.19 Let X be a continuous random variable with pdf fx{^) an ^ 
cdf F x {x). Let Y = 6X with 0 > 0. Then 

F Y {y) = F x (I) , fv(y) = ]fx (|) - 

Proof: 

Fy(y) = Pr(y <y) = Pr(0X < y) = Pr (x < |) = F x (|) 

/Hy) = □ 

Corollary 4.20 The parameter 6 is a scale parameter for the random variable 
Y. 


The following example illustrates this process. 
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Example 4.21 Let X have pdf f{x) = e 一 x ，x > 0. Determine the cdf and 
pdfofY = QX. 

F x {x) = l-e- x , iV(y) = l-ei’ 0 ， 

My) = le-y/ 0 . 

We recognize this as the exponential distribution. 口 

4.4.3 Raising to a power 

Theorem 4.22 Let X be a continuous random variable with pdf fx(^) ^nd 
cdfF x (x) with F x {0) = 0. Let Y = X 1 ’' Then ， ifr>0, 

F Y (y) = F x (y T ), fr{y) = ry T - 1 fxbH ， 2 / > 0 

' while，ifr < 0 , 

F Y (y) = 1 一 Fx{y r ), fv(y) = -ry T ~~ l fx{y T )- ( 4 . 3 ) 

Proof: If r > 0 

Fy(y) = Pr(^* < y T ) = F x(y r )^ 

while if r < 0 

F Y {y) = Pr(X > y r ) = 1 - Fx(y T )- 

The pdf follows by differentiation. 口 

It is more common, to keep parameters positive and so, when r is negative, 
•create a new parameter r* = 一丁 . Then (4.3) becomes 

F Y {y) = 1 - Fx{y~ T '), fv(y) = /y-’Ky-’、. 

Drop the asterisk for future use of this positive parameter. 

Definition 4.23 When raising a distribution to a power, if t > 0 the result¬ 
ing distribution is called transformed, if r = -l it is called inverse, and 
ifr<0 {but is not -1) it is called inverse transformed. To create the 
distributions in Appendix A and to retain 6 as a scale parameter, the base 
distribution should be raised to a power before being multiplied by 6. 

Example 4.24 Suppose X has the exponential distribution. Determine the 
cdf of the inversCf transformed, and inverse transformed exponential distribu¬ 
tions. 

The inverse exponential distribution with no scale parameter has cdf 
F{y) = 1 - [1 - e~ 1/y ] = e- 1 ^. 
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With the scale parameter added it is F(y) = e~ 6 ^ y . 

The transformed exponential distribution with no scale parameter has cdf 

F{y) = 1 - exp(- 2 / r ). 

With the scale parameter added it is F(y) = 1 一 exp [— (y/0) T ]_ This distrib¬ 
ution is more commonly known as the Weibull distribution. 

The inverse transformed exponential distribution with no scale parameter 
has cdf 

F{y) = 1 - [1 - exp (- 2 T t ) 】 =exp(-y- T ). 

- With the scale parameter added it is F(y) = exp[—(0/2/) r ]. This distribution 
is the inverse Weibull. □ 


Another base distribution has pdf f(x) = x Q " 1 e _a: /r(a). When a scale 
parameter is added, this becomes the gamma distribution. It has inverse 
and transformed versions that can be created using the results in this section. 
Unlike the distributions introduced to this point, this one does not have a 
closed form cdf. The best we can do is define notation for the function. 

Definition 4.25 The incomplete gamma function with parameter a > 0 
is denoted and defined by 


while the gamma function is denoted and defined by 



In addition, r(a) = (a — l)T(a — 1) and for positive integer values of 
n, T(n) = (n— 1)!. Appendix A provides details on numerical methods of 
evaluating these quantities. Furthermore, these functions are built into most 
spreadsheet programs and many statistical and numerical analysis programs. 


4.4.4 Exponentiation 

Theorem 4.26 Let X be a continuous random variable with pdf fx{^) 似 id 
cdf Fx(x) with fx(^) > 0 for all real x. Let Y = exp(X). Then，for y > 0, 

Fy(y) = F x (hiy), f Y (y) = ^fxQ^y). 

Proof: Fy(y) = Pr(ie x <y) = Pt(X < hiy) = FxQil y). □ 

Example 4.27 LetX have the normal distribution with mean fj, and variance 
a 2 . Determine the cdf and pdf ofY = e x . 


F Y (y) = # 




純)=# 


We could try to add a scale parameter by creating W, = 6Y, but this 
adds no value, as is demonstrated in Exercise 4.22. This example created the 
lognormal distribution (the name has stuck even though “expnormal” would 
seem more descriptive). 


4.4.5 Mixing 

The concept of mixing can be extended from mixing a finite number of random 
variables to mixing an uncountable number. In the following theorem, the pdf 
/a(A) plays the role of the discrete probabilities aj in the /c-point mixture. 

Theorem 4.28 Let X have pdf /x| 八㈤ A) cmd cdf F x \a{x\X) ? where A is a 
parameter of X. While X may have other parameters, they are not relevant 
Let X be a realization of the random variable A with pdf /a(A). Then the 
unconditional pdf of X is 

fx(x) = J f x \ A (x\X)h{X)dX, (4.4) 

where the integral is taken over all values of X with positive probability. The 
resulting distribution is a mixture distribution. The distribution function 
can be determined from 

Fx{x) = J J fx\A(y\^)fAWdXdy 
= J J fx\A(vWhWdyd\ 

= / F xlA (x\X)f A (X)dX. 


Moments of the mixture distribution can be found from 


E(X k ) = E[E(X fe |A)] 


and，in particular, 


Vax(X) = E[Var(X|A)] + Vax[E(X|A)]. 
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> roo f: The integrand is, by definition, the joint density of X and A. The 
ategral is then the marginal density. For the expected value (assuming the 
order of integration can be reversed), 

E(X k ) = J j x k fx\A(x\X)h(X)dXdx " 

=j \J a ： fe /x|A(a ： |A)dx| h{X)d\ 

= E[E(X fc |A)]. 

For the variance, 

Var(X) = E(X 2 ) — [E(X )] 2 

=E[E(X 2 |A)] - {E[E(X|A)]} 2 
= E{Vax(X|A) + [E ( 雄 )1 2 } - (E[E(X|A)]} 2 
=E[Var(X|A)] + Vax[E(X|A)]. 口 

Note that, if / a (A) is discrete, the integrals must be replaced with sums. An 
alternative way to write the results is f x (x) = E A [/ X [ A (a ： |A)] and F x {x)= 
E a [Fx|a(^|A)], where the subscript on E indicates that the random variable 

1S An interesting phenomenon is that mixture distributions tend to be heavy- 
tailed so this method is a good way to generate such a model. In particular, 
if f x \A(x\X) has a decreasing hazard rate function for all A, then the mixture 
distribution will also have a decreasing hazard rate function (see Ross [114 ]， 
pp 407-409 for details). The following example shows how a familiar heavy- 
tailed distribution maybe obtained by mixing. 

Example 4.29 LetX\A have an exponential distribution with parameter 1/A. 
Let A have a gamma distribution. Determine the unconditional distribution 
ofX. 

We have (note that the parameter 6 in the gamma distribution has been, 
replaced by its reciprocal) 



This is a Pareto distribution. □ 

The following example provides an illustration useful in Chapter 16. 

Example 4.30 Suppose that，given © = 0 ， X is normally distributed with 
mean 0 and variance v } so that 

fx \0 (x\0) = exp (a; - 0) 2 j , -oo < a; < oo, 

and Q is itself normally distributed with mean \i and variance a f that is, 

fe(0) = exp [ 一士 ( 沒一 M) 2 ] , -oo < 0 < oo. 

Determine the marginal pdf of X. 

The marginal pdf of X is 

fx(x) = /oo ^ exp [~i {x - 6f ] ^ exp [~h {e - ^ )2 ] de 

= [- 》 - 吖 - 知 -m) 2 ] 必 . 

We leave as an. exercise for the reader the verification of the algebraic identity 

(x-0) 2 | (6-fi) 2 a + v f Q + | Qg —M ) 2 

v a va \ a + v J a-tv 


obtained by completion of the square in Thus ， 


Qc-M) 2 ] 


fx(x)= 


2 (q + 幻 ) J 


y/27r(a-bv) 


■v a + v f n ax + v, 

— exp 一一 -—— 6 - 

)a 2va \ a-\-v 




d6. 


We recognize the integrand as the pdf (as a function of 9) of a normal dis¬ 
tribution with mean (ax + vfi)/(a-\-v) and variance (va)/(a + v). Thus the 
integral is 1 and so 


fx(x) = ■ 


{x-ll) 2 ' 
. 2 (a + v) _ 


—OO < X < oo: 


^27r(a + v) 

that is, X is normal with mean /x and variance a-\-v. 


The following example is taken from Hayne [50]. It illustrates how this 


can arise. 












, 0 fa parameter is not known, but a probability density function can be 
dated to describe possible values of that parameter. 


mple 4.31 In the valuation of warranties on automobiles it is impoHant 
cognize that the number of miles driven varies from driver to driver. It is 





irrnine the distribution for the number of miles driven in a randomly 
year by a randomly selected driver. 


xthe paxameterizationsfrom Appendix A, the inverse Weibull for mile 】 
n a year has parameters A (in place of 0 ) andr while the transformed 
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analysis of lifetime distributions in survival analysis, the resulting mathemat¬ 
ical convenience implies that the approach may also be viewed as a useful way 
to generate new distributions by mixing. 

We begin by introducing a frailty random variable A > 0 and define the 
conditional hazard rate (given A = A) of X to be /ix|a(^|A) = Aa(a;), where 
a{x) is a known function of x [that is, a(x) is to be specified in a particular 
application]. The frailty is meant to quantify uncertainty associated with the 
hazard rate, which by the above specification of the conditional hazard rate 
acts in a multiplicative manner. 

The conditional survival function of X\A is therefore 

5x|a(^|A) = = e -A 刷， 

where A(x) = a(t)dt. In order to specify the mixture distribution (that is, 
the marginal distribution of X), we define the moment generating function 
of the frailty random variable A to be M/^(t) = E(e tA ). Then, the marginal 
survival function is 

Sx(x) = = M a [-A(x)], (4.5) 

and obviously Fx(x) — 1 — Sx(x). 

The type of mixture to be used determines the choice of a(x) and hence 


generating function MA(t) of A. 
but other choices such as inverse 


Example 4.32 Let A have a gamma distribution c 
distribution with conditional survival function Sx\a 
the unconditional or marginal distribution of X. 

In this case it follows from Example 3.15 that the gamma moment gener¬ 
ating function is M\{t) = (1 一 6t)~ a , and from (4.5) it follows that X has 
survival function 

S x {x) = M a (-x^) = (1 + 9x^)~ a . 

This is a Burr distribution (see Appendix A) with the usual parameter 6 
replaced by Q— 1 ’ 1 • Note that when 7 = 1 this is an exponential mbcture 
which is a Pareto distribution, considered previously in Example 4.29. □ 


As mentioned earlier, mixing tends to create heavy-tailed distributions ， 
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4.4.7 Splicing 

Another method for creating a new distribution is by splicing. This approach 
is similar to mixing in that it might be believed that two or more separate 
processes axe responsible for generating the losses. With!* m i x i n g，the various 
processes operate on subsets of the population. Once the subset is identified, 
a simple loss model suffices. For splicing, the processes differ with regard to 
the loss amount. That is, one model governs the behavior of losses in some 
interval of possible losses while other models cover the other intervals. The 
following definition makes this precise. 

Definition 4.33 A k-component spliced distribution has a density func¬ 
tion that can be expressed as follows: 

{ aifi(x), c 0 <x< ci, 
a 2 f 2 (xh c ± <x< c 2 , 

: : 

akfk (^), 邙 一 i <x<c k . 

For j = 1 ， … ， fc ，each aj > 0 and each f 3 {x) must be a legitimate density 
function with all probability on the interval (cj_i, Cj). Also, a\ +- = 1. 

Example 4.34 Demonstrate that Model 5 on Page 21 is a two-component 
spliced model. 

The density function is 

0 * 01i & <50 ， 

7 w ~ \ 0.02, 50 < a: < 75 

and the spliced model is created by letting fi(x) = 0.02, 0 < a; < 50, which 
is a uniform distribution on the interval from 0 to 50, and f 2 (x) = 0.04, 
50 < a; < 75, whicli is a uniform distribution on the interval from 50 to 75. 
The coefficients axe then ai = 0.5 and a 2 = 0.5. □ 

It was not necessary to use density functions and coefficients, but this is 
one way to ensure that the result is a legitimate density function. When using 
parametric models, the motivation for splicing is that the tail behavior may 
be inconsistent with the behavior for small losses. For example, experience 
(based on knowledge beyond that available in the current, perhaps small j 
data set) may indicate that the tail follows the Pareto distribution, but there 
is a positive mode more in keeping with the lognormal or inverse Gaussian 
distributions. A second instance is when there is a large amount of data below 

some value but a limited amount of information elsewhere. We may want to 

use the empirical distribution (or a smoothed version of it) up to a certain: 
point and a parametric model beyond that value. The definition given above- 
is appropriate when the break ] 
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Another way to construct a spliced model is to use staurl— 

===^ Sit 咖 iert0havethebreak ^^^a^l: 

a restnction could be added to the specification. points). Such 

4 f 5 C : eate a two - 赚 Ponent spliced model using an'exponer,Hnl 
d^onfromO toe and a Pareto distention {using , 


The basic format is 
fx{x )= 


B- X e~ x l 6 
T— e—c/e ， 
a 7 Q (a; + 7 ) 
7 a (c + 7) - a 


0 < 怎 < c, 

c<x <oo. 




















addition of a third parameter has not created a new distribution. 

4.23 (*) Let X have a Pareto distribution with parameters a and 6. Let 
Y = ln(l + X/0). Determine the name of the distribution of Y and its 
parameters. 

4.24 In [132], Venter noted that if X has the transformed gamma distribution 
and its scale parameter 8 has an inverse transformed gamma distribution 
(where the parameter r is the same in both distributions) the resulting mixture 
has the transformed beta distribution. Demonstrate that this is true. 

4.25 (*) Let N have a Poisson distribution with mean A. Let A have a 
gamma distribution with mean 1 and variance 2. Determine the unconditional 
probability that N = 1. 

4.26 (*) Given a value of 0 = 0, the random variable X has an exponential 
distribution with hazard rate function h(x) = 0, a constant. The random vari¬ 
able © has a uniform distribution on the interval (1,11). Determine 5x(0.5) 
for the unconditional distribution. 

4.27 (*) Let N have a Poisson distribution with mean A. Let A have a uni¬ 
form distribution on. the interval (0,5). Determine the unconditional proba¬ 
bility that N >2. 


frailty distribution. 


4.29 Suppose that X\A has the Weibull survival function 5 x|a(^|A) = e— 入卬' 
a; > 0, and A has an exponential distribution. Demonstrate that the uncon¬ 
ditional distribution of X is loglogistic. 

4.30 Consider the exponential-inverse Gaussian frailty model with a(x )= 
fl/(2\/l + Gx), where 0 > 0. 

(a) Verify that the conditional hazard rate /ix|a(^|A) of X\A is indeed 
a valid hazard rate. 

(b) Determine the conditional survival function 5又|八($| 入). 

(c) If A has a gamma distribution with parameters 9 = 1 and a re¬ 
placed by 2a、determine the marginal or unconditional survival 
function of X. 

(d) Use (c) to argue’that a given frailty model may arise from more 
than one combination of conditional distributions of X\K and frailty 
distributions of A. 

4.31 Suppose that X has survival function Sx(x) = 1—Fx(x) given by (4.5). 
Show that Si(x) = -Fx(^) / [E(A)A(x)] is again a survival function of the form 
given by (4.5), and identify the distribution of A associated with Si(x). 


4.32 Fix 5 > 0, and defic 


“Esscher-transformed” 


A s with probability density function (or discrete probability rm 
the discrete case) /a s (A) = e_ s 入 /a(A)/Ma (- s), A > 0. 

(a) Show that A s has moment generating function 

M A “㈣ 

(b) Define the cumulant generating function of A to be 


c A (t) = ln[M A (t)]， 


and use (a) to prove that 


c a(~«) = E(A S ) and c^(-s) = Var(A s ). 


(c) For the frailty model with survival function given by (4.5), prove 
that the associated hazard rate may be expressed as hx(x)= 
a(x)c^[—A(a;)], where ca is defined in (b). 

(d) Use (c) to show that 

h ， x( x ) = a’ ⑷ 4 [-雄 )]- [ a { x )f^[-A{x)). 





68 CLASSIFYING AND CREATING DISTRIBUTIONS 


(e) Prove using (d) that, if the conditional hazard rate is 

nonincreasing in x, then hx{x) is also nonincreasing in x. 



2x\ II ) 


where 0 > 0 and 弘 > 0 are parameters. 

(a) Derive the pdf of the reciprocal inverse Gaussian random variable 

l/X. 

(b) Prove that the “joint” moment generating function of X and l/X 
is given by 

M{hM) = E (e* lX+t2X_1 ) 

= r~T~ fe- y /(e-2 t iH 1 )(d-2t 2 )\ 

\j 9 — 2tz 6X ^ > y M ) ’ 

where ti <0 / (2/i 2 ) and t 2 < 0/2. 

(c) Use (b) to show that the moment generating function of X is 


M x (t) = E (e tx ) = exp ^1 - y 1 - -y-tj J , t < ^2, 

and that this agrees with the reparameterized result in Exercise 
3.24. 

(d) Use (b) to show that the reciprocal inverse Gaussian random vari¬ 
able l/X has moment generating function 


M 1/X (i) = E(e* x_1 : 


周， 


SELECTED DISTRIBUTIONS AND THEIR RELATIONSHIPS 69 


Hence prove that l/X has the same distribution as Zi + 么， where 
Zi has a gamma distribution, Z 2 has an inverse Gaussian distrib¬ 
ution, and Zi is independent of Z 2 . Also, identify the ganpaa and 
inverse Gaussian parameters in this representation. 

(e) Use (b) to show that 




has a gamma distribution with parameters a = ^ and the usual 
parameter 6 (in Appendix A) replaced by 2/0. 

4.5 SELECTED DISTRIBUTIONS AND THEIR RELATIONSHIPS 


4.5.1 Introduction 




























Fig. 4.6 Transformed/inverse transformed gamma £ 


4.5-3 Limiting distributions 

Tlie classification in the preceding section involved distribut: 
cial cases of other distributions. Another way to relate disti 
what happens as parameters go to their limiting values of z 

Example 4.36 Show that the transformed gamma distribu 
case of the transformed beta distribution as 0 oo, a —> oo, 
a constant 

The demonstration relies on two facts concerning limits: 
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The limit in (4.6) is known as Stirling’s formula and provides an approximation 
for the gamma function. The limit in (4.7) is a standard result found in most 
calculus texts. 





various distributions 


Re-Insurance Compaq for 



























(parameters approach : 


stributional relationships and characteristics. 


e now tormaiize some ot tne not 


Crete phenomena. The probability func 
that exactly k events (such as claims or 
variable representing the number of such ( 

Pk = Pr (i\T = A;), - 

As a reminder, the probability generating 
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For the Poisson distribution the variance is equal to the mean. The Poisson 
distribution can arise from a Poisson process (to be discussed in Chapter 8). 
The Poisson distribution and Poisson processes are also discussed in •many 
textbooks in statistics and actuarial science, including Panjer and Willmot 
[106] and Ross [116]. f, 

The Poisson distribution has at least two additional useful properties. The 
first is given in the following theorem. 

Theorem 4.37 Let NhN n be independent Poisson variables with para¬ 
meters Ai, … ， A n . Then N = Ni + … + N n has a Poisson distribution with 
parameter Xi 4 - h A n . 

Proof: The pgf of the sum of independent random variables is the product 
of the individual pgfs. For the sum of Poisson random variables we have 


Pn(z )= 


n Pn M = n - !)] 

j=l j=l 

In 1 


exp 


E A ^" X ) 

i=i _ 


= e 小 - 1 )， 

where A = A x + —— |-A n . Just as is true with moment generating functions, the 
pgf is unique and therefore N must have a Poisson distribution with parameter 
A. □ 


The second property is particularly useful in modeling insurance risks. Sup¬ 
pose that the number of claims in a fixed time period, such as one year, follows 
a Poisson distribution. Further suppose that the claims can be classified into 
m distinct types. For example, claims could be classified by size, such as 
those below a fixed limit and those above the limit. It turns out that, if one is 
interested in studying the number of claims above the limit, that distribution 
is also Poisson but with a new Poisson parameter. 

This is also laseful when considering removing or adding a part of an insur¬ 
ance coverage. Suppose that the number of claims for a complicated medical 
benefit coverage follows a Poisson distribution. Consider the “types” of claims 

to be the different medical procedures or medical benefits under the plan. If 

one of the benefits is removed from the plan, again it turns out that the distri¬ 

bution of the number of claims under the revised plan will still have a Poisson 
distribution but with a new parameter. 

In each of the cases mentioned in the previous paragraph, the number of 
claims of the different types will not only be Poisson distributed but also be 
independent of each other; that is, the distributions of the number of claims 
above the limit and the number below the limit will be independent. This 
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is a somewhat surprising result. For example, suppose we currently sell a 
policy with a deductible of 50 and experience has indicated that a Poisson 
distribution with a certain parameter is a valid model for the number of 
payments. Further suppose we are also comfortable with the assumption that 
the number of losses in a period also has the Poisson distribution but we do 
not know the parameter. Without additional information, it is impossible to 
infer the value of the Poisson parameter should the deductible be lowered or 
removed entirely. We now formalize these ideas in the following theorem. 

Theorem 4.38 Suppose that the number of events N is a Poisson random 
variable with me^m 入 . Further suppose that each event can be classified into 
one of m types with probabilities pi，• • •，Pm independent of all other events. 
Then the number of events iVi,..., N m corresponding to event types 1， • • • ， m 
respectively, are mutually independent Poisson random variables with means 
Api ， • • • ， Ap m； respectively. 

Proof: For fixed N = n, the conditional joint distribution of (JVi ,... ， N m ) 
is multinomial with parameters (n ， pi, • • • ,p m ). Also, for fixed iV = n, the 
conditional marginal distribution of A T j is binomial with parameters 
The joint pf of (iVi , … ， N m ) is given by 

Pt{N i = Tii, — j N m = n m )= 


Pr(Nx n m \N = n) 

x Pv(N = n) 

n\ ni n e~ A A n 


n 3 l ' 


where n = ni 4- ri 2 H - h n m . Similarly, the marginal pf of Nj is determined 

below. 


Pr(jVj = rij)= 


oo 

Y2 PrCiVj- = nj \N = n) Pv(N = n) 

i : o - ，夺 

e -A(%ri [a(i - 巧)〜 

C -A ( A Pj) nj C xa-Pi) 

%! 


Hence the joint pf is the product of the marginal pfs. establishing mutual 
independence. □ 
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Example 4.39 In a study of medical insurance the expected number of claims 
per individual policy is 2.3 and the number of claims is Poisson distributed. 
You are considering removing one medical procedure from the coverage under 
this policy. Based on historical studies, this procedure accounts for approxi¬ 
mately 10% of the claims. Determine the new frequency distribution. 

Prom Theorem 4.38, we know that the distribution of the number of claims 
expected under the revised insurance policy after removing the procedure 
from coverage is Poisson with mean 0.9(2.3) - 2.07. In carrying out studies 
of the distribution of total claims, and hence the appropriate premium under 
the new policy, one also needs to study the change in the amounts of losses, 
the severity distribution, because the distribution of amounts of losses for 
the procedure which was removed may be different from the distribution of 
amounts when all procedures are covered. □ 


4.6.3 The negative binomial distribution 

The negative binomial distribution has been used extensively as an alternative 
to the Poisson distribution. Like the Poisson distribution, it has positive 
probabilities on the nonnegative integers. Because it has two parameters, it 
has more flexibility in shape than the Poisson. 

Definition 4.40 The ■probability Junction of the negative binomial distri- 
bution is given by 

p r (iv = k)=p k =(^ + r k ~ 1 ) (^)( 為)， 

fe = 0 ， l ， 2 ，...， r > 0, /? > 0. (4.9) 

The binomial coefficient is to be evaluated as 

fx\ x(x — 1) … (x — fc +1) 

Uh w • 

rxn.„*7« u - om+onoT nr- mnoi hp ami real nUTTlbcT. WhCTl X 〉 k 一 1 ； H 
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From this it follows that the mean and variance of the negative binomial 
distribution are 

E{N) = rp and Var(iV) = r/3(l + /3). 

Because /? is positive, the variance of the negative binomial distribution 
exceeds the mean. This is in contrast to the Poisson distribution for which 
the variance is equal to the mean. This suggests that for a particular set of 
data, if the observed variance is larger than the observed mean, the negative 
binomial might be a better candidate than the Poisson distribution as a model 
to be used. 

The negative binomial distribution is a generalization of the Poisson in 
at least two different ways, namely as a mixed Poisson distribution with a 
gamma mixing distribution (demonstrated later in this subsection) and as 
a compound Poisson distribution with a logarithmic secondary distribution 
(see Section 4.6.7). Another view of the Poisson distribution is presented in 
Chapter 8. There, among other assumptions, the rate at which claims occur 
is assumed constant over time. If the rate is linearly increasing with regard to 
the number of past claims, then the number of claims in any period will have 
the negative binomial distribution. See Insurance Risk Models [106, Theorem 
3.6.1] for this derivation of the negative binomial distribution. 

The geometric distribution is the special case of the negative binomial 
distribution when r = 1. The geometric distribution is, in some senses, the 
discrete analogue of the continuous exponential distribution. Both the geo¬ 
metric and exponential distributions have an exponentially decaying proba¬ 
bility function and hence the memoryless property. The memoryless property 
can be interpreted in various contexts as follows. If the exponential distribu- 
tion is a distribution of lifetimes, then the expected future lifetime is constant 
for any age. If the exponential distribution describes the size of insurance 
claims, then the memoryless property can be interpreted as follows: Given 
that a claim exceeds a certain level d，the expected amount of the claim in 
excess of d is constant and so does not depend on d. That is, if a deductible 
of d is imposed, the expected payment per claim will be unchanged, but of 
course the expected number of payments will decrease. If the geometric dis¬ 
tribution describes the number of claims, then the memoryless property can 
be interpreted as follows: Given that there are at least m claims，the proba¬ 
bility distribution of the number of claims in excess of m does not depend on 
m. Among continuous distributions, the exponential distribution is used to 
distinguish between subexponential distributions with heavy (or fat) tails and 
distributions with light (or thin) tails. Similarly for frequency distributions, 
distributions that decay in the tail slower than the geometric distribution are 
often considered to have long tails, whereas distributions that decay more 
rapidly than the geometric have short tails. The negative binomial distribu¬ 
tion has a long tail (decays more slowly than the geometric distribution) when 
r < 1 and a lighter tail than the geometric distribution when r > 1. 






















IS »I if ； rt,na ua j , » 一 • — 

mtion, in this case u{\), over the population of drivers. The true 
unobservable. All we observe axe the number of accidents coming 
■iver. There is now an additional degree of uncertainty, that is，:: 
about the parameter. 


From the definition of the gamma distribution in Appendix A, this expres¬ 
sion can be evaluated as 

r(fc + a) e k 
Vk fc!r ⑷ (i + 6) k + a 


This formula is of the same form as (4.9), demonstrating that the mixed 
Poisson, with a gamma mixing distribution, is the same as a negative-binomial 
distribution. 

It is worth noting that the Poisson distribution is a limiting case of the 
negative binomial distribution. To see this, let r go to infinity and /3 go to 


Substituting p = X/r in the 
5) 


to (using L 5 H6pital 5 s rule in lines 3 and 


exp < lim —rln 


ln[l — X(z — l)/r] 


吣 一 l)/0_l)/r 2 ' 


r — X(z — 1) 
exp I lim [ 入 (2 - 1)]| 
exp[X(z - 1)] 


which is the pgf of the Poisson distribution. 


4.6.4 The binomial distribution 

The binomial distribution is another counting distribution that arises natu¬ 
rally in claim number modeling. It possesses some properties different from 
the Poisson and the negative binomial that make it particularly useful. First, 
its variance is smaller than its mean. This makes it useful for data sets in 













insurance 


a person wim muac w^xx 以丄 ^ — 

of claims for a single person follows a Bernoulli 
with, probability 1 — 5 at 0 and probability 殳 at 1 
nf tho nnmhftr of claims Der individual i 


probability generating 
l given by 


• the probability gener¬ 
ating functions can be muitipuea xogetner lu give probability generating 
function of the total number of claims arising from the group of ^ 

That probability generating function is 

P(z) = [1 + g( 2 ： - l)] m , 0 < g < 1. 

Then from this it is easy to show that the probability of exactly 
the group is 


Pk = Pr(iV = k) = 


the pf for a binomial distribution with parameters m and 
Bernoulli trial framework, it is clear that at most m events (c 
cur. Hence, the distribution only has positive probabilities on t] 
integers up to and including m. 

Consequently, an additional attribute of the binomial distr 
sometimes useful is that it has finite support; that is, the ramj 
which there exist positive probabilities has finite length. This 


accident or the number of family members covered under a health msurance 
policy In each case it is reasonable to have an upper limit on the range 
of possible values. It is useful also in connection with situations where it is 
believed that it is unreasonable to assign positive probabilities beyond some 
point. For example, if one is modeling the number of accidents per automobde 
during a one-year period, it is probably physically impossible for there to be 
more than some number, say 12 , of claims during the year given the time 
it would take to repair the automobile between accidents. If a model with 
probabilities that extend beyond 12 were used, those probabilities should be 
very small so that they have Httle impact on any decisions that axe made. 
The mean and variance of the binomial distribution axe given by 

E(iV) = mq, Var(iV) = mq(l - q)- 
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Table 4.1 Members of the (a, b, 0) class 


Distribution 

a 

b 

Po 

Poisson 

0 

A 

e _A 

TY* * | 

Q 

4- 1 ^ ^ 

(i-?r 

(l + /3)- r 

Negative binomial 

V 9 

T+P 


Geometric 

P 

1 + /3 

o . 

(1 + /3)- 1 


4.6.5 The (a, 6, 0) class 

The following definition characterizes the members of this class of distribu¬ 
tions. 

Definition 4.41 Letpk be the pf of a discrete random variable. It is a mem- 
her of the (a, b, 0) class of distributions, provided that there exists con¬ 
stants a and b such that 

•— — = ct + 7 ， ft = 1 ， 2,3, • •. • 

Pk-i k 

This recursion describes the relative size of successive probabilities in the 
counting distribution. The probability at zero, po, can be obtained from the 
recursive formula because the probabilities must add up to 1. This provides 
a boundary condition. The (a,&,0) class of distributions is a two-parameter 
class, the two parameters being a and b. By substituting in the probability 
fiinction for each of the Poisson, binomial, and negative binomial distributions 

on the left-hand side of the recursion, it can be seen that each of these three 

distributions satisfies the recursion and that the values of a and b axe as 
given in Table 4.1. In addition the table gives the value of po? the starting 
value for the recursion. Also in the table is the geometric distribution, the 
one-parameter special case (r = 1 ) of the negative binomial distribution. 

It can be shown (see Panjer and Willmot [106, Chapter 6 ]) that these are 
the only possible distributions satisfying this recursive formula. 

The recursive formula can be rewritten as 

k = aA; + 6, fc = 1,2, 3, • • • • 

Pk-i 

The expression on the left-hand side is a linear function in k. Note from Ta¬ 
ble 4.1 that the slope a of the straight line is 0 for the Poisson distribution, is 
negative for the binomial distribution, and is positive for the negative bino 
mial distribution, including the geometric. This suggests a graphical way of 
indicating which of the three distributions might be selected for fitting. First, 
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ber of accidents under the policy is recorded in the table. Also recorded m the 

table is the observed value of the quantity that should be linear. 

Figure 4.8 plots the value of the quantity of interest against fe, the number 
of accents. It can be seen from the graph that the ° f ^ of 

looks approximately linear except for the pomt at fe = 6^The rebab^ty^ 
the quantities as k increases diminishes became the ^ber °f ^servat^ 
becomes smaU and the vaxiabiUty of the results grows. This grates the 
wealmessofthisadhoc procedure. Visudly, aU the pomts ap^ to h 
equal value. However, the points on the left axe more reliable than the pomts 
on the right. Rom the graph, it can be seen that the slope ^ positive and 
the data appear approximately linear. This suggests the negative 
distribution is an appropriate model. Whether or not the slope is significantly 
fmm 。 is also not easily judged from the graph. By rescaling the 
vfSeal axis of the graph, the slope can be made tolook : 

slope could be made to appear to be significantly different from 0- Graphic^ 
it £ difficult to distinguish between the Poisson and the negative binomial 
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k 


Fig. 4.8 Plot of the ratio kn k /n k -i against k. 

lodl because the Poisson requires a slope of 0. However, we ( 
binomial distribution is probably not a good choice since the] 


evidence ot a negative 

and negative binomial distributions and conduct a more formal 


appropriateness of the distributions by 
mce to the mean. For this data set, the 


is sufficiently larger than the mean to warrant use of the negative binomial. 
In order to do some formal analysis, Table 4.3 gives the results of maximum 
likelihood estimation (to be discussed in Chapter 12) of the parameters of the 
Poisson and negative binomial distributions and the negative loglikelihood in 
y ； each case. In Chapter 13 formal selection methods axe presented. They would 
(indicate that the negative binomial is superior to the Poisson as a model for 
: this data set. However, those methods also indicate that the negative binomial 
is not a particularly good model, and thus some of the distributions yet to be 
introduced should be considered. 

In subsequent subsections we will expand the class of the distributions 
t beyond the three discussed in this section by constructing more general models 
I related to the Poisson, binomial, and negative binomial distributions. 

X 4.6.6 Truncation and modification at zero 

At times, the distributions discussed previously do not adequately describe 
the characteristics of some data sets encountered in practice. This may be 
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Table 4.3 Poisson-negative binomial comparison 



Distribution estimates _ -Loglikelihood 

Poisson A = 0.2143537 5 5 490.78 
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are the same up to a constant of proportionality because Y^=i can be set 
to any number in the interval (0,1]. The remaining probability is at fc = 0. 

We will distinguish between the situations in which po = 0 and those 
where p 0 > 0. The first subclass is called the truncated (more specifically, 
zero-truncated) distributions. The members are the zero-truncated Poisson, 
zero-truncated binomial, and zero-truncated negative binomial (and its special 
case, the zero-truncated geometric) distributions. 

The second subclass will be referred to as the zero-modified distributions 
because the probability is modified from that for the (a, 6,0) class. These 
distributions can be viewed as a mixture of an (a, b } 0) distribution and a 
degenerate distribution with all the probability at zero. Alternatively, they 
can be called truncated with zeros distributions since the distribution can 
be viewed as a mixture of a truncated distribution and a degenerate distri¬ 
bution with all the probability at zero. We now show this more formally. 
Note that all zero-truncated distributions can be considered as zero-modified 
distributions, with the particular modification being to set p 0 = 0. 

With three types of distributions, notation can become confusing. When 
writing about discrete distributions in general, we will continue to let pk = 
p r (iV = fc). When referring to a zero-truncated distribution, we will use pf, 
and when referring to a zeromodified distribution, we will use p 1 ^ 1 . Once 
again, it is possible for a zeromodified distribution to be a zero-truncated 
distribution. 

Let P(z) = J2T=oP^ k denote the pgf of a member of the (a, 6,0) class. 
Let P M (z) = J2kLoPk Izk denote the pgf of the corresponding member of the 
(a, 6,1) class; that is, 

— cpfcj & = 1 ， 2,3 , ... ， 
and Pq 1 is an arbitrary number. Then 

P M (z) = pf + fx? 

fc=l 

fc=l 

= Po 1 + c[P(z) - Po]. 

Because P M {1) = P(l) = 1, 

1 = Po 1 + c{l - po), 

resulting in 

c = 1 — — or p^ 1 = 1 - c(l - p 0 ). 

I-Po 
















This is a weighted average of the pgfs of the degenerate distribution and 
the corresponding (a, 6 ,0) member. Furthermore, 

p^ 1 = fc = 1 ， 2, •. • . （ 4.11) 

Let P T {z) denote the pgf of the zero-truncated distribution corresponding to 
an (a, 6 ,0) pgf P(z). Then, by setting p^ 1 = 0 in (4.10) and (4.11), 


Then from (4.11) 


p^f = (1 — Pq 1 )Pk 3 fe = 1,2.... , 
P M {z) = p^(l) + (1 — P^)P T (z), 


Then the zero-modified distribution is also the weighted average of a degen¬ 
erate distribution and the zero-truncated member of the (a, 6 ,0) class. The 
following example illustrates these relationships. 

Example 4.44 Consider a negative binomial random variable with parame¬ 
ters (3 = 0.5 andr = 2.5. Determine the first four probabilities for this random 
variable. Then determine the corresponding probabilities for the zero-truncated 
and zero-modified (with p^ 1 — 0 . 6 ) versions. 

Prom Table 4.4 on Page 89 we have, for the negative binomial distribution, 

p 0 = (1 + 0.5) 一 2 . 5 = 0.362887 ， 

0.5 1 


The first three recursions are 


Pl = 0.362887 (| + |t) = 0.302406, 
p 2 = 0.302406 (i + = 0.176404, 

p 3 = 0.176404 (I + == 0.088202. 
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For the zero-truncated random variable, Pq = 0 by definition. The re¬ 
cursions start with [from (4.12)] pf = 0.302406/(1 — 0.362887) = 0.474651. 
Then 

pl = 0.474651 (I + \\) = 0.276880, 
pl = 0.276880 (| + ii) = 0.138440. 

If the original values were all available, then the zero-truncated probabilities 
could have all been obtained by multiplying the original values by 1/(1 — 
0.362887) = 1.569580. - 

For the zero-modified random variable, p^ 1 = 0.6 arbitrarily. From (4.11), 
pM = (i_ 0.6)(0.302406)/(1 - 0.362887) = 0.189860. Then 

pf = 0.189860 (I + If) = 0.110752, 
pf = 0.110752 (| + |i) = 0.055376. 
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(see Exercise 4.45). Thepgfofthe logarithmic distribution is 


P T (z) = 1 - 


\a\l-P(z-l)] 

ln(l + 灼 — 


(4-17) 




bilities. 


It is also interesting that the special extreme casevn^-l<r 
oo is a proper distribution, and is sometimes called the Sibuya distribution. 
■It has prf P(z) = 1 - (1 - z)- T and no moments exist (see Exercise 4.47). 
Distributions with no moments axe not particularly 
claim numbers (unless the right tail is ^ 

an infinite number of claims are expected. This might be difficult to price. 


modified version with p^ 1 = 0.6 set arbitrarily. 


Subsequent values axe 


Pi 



(0.853553) = 0.106694, 


P3 = 



(0.106694)= 


0.026674. 


For the modified probabilities, the truncated probabUities need to be mrdti- 
plied by 0.4 to produce pf = 0.341421, pf = 0-042678, -d Pf = a0106m 

A reLnable question is to ask if there is a «naW membe^f the ETNB 
^ _ r _ 丄： iwvm'Girvn wniilfl besiii witli Pi ratner 


mpo. lor tnat to De me 。助《，叫。 - - r “ 

m = (0 5-0 75/l)po = -0.25po- This would force one of the two probabilities 
to be negative and so there is no acceptable solution. It is easy to show th^ 

this occurs for any r < 0. 

There axe no other members of the (a, b, 1) class beyond the ones discussed 
above. A summary is given in Table 4.4. 

4.6.7 Compound frequency models 

A larger class of distributions can be created by the processes of compomding 
any l^o discrete distributions. The term compounding reflects the idea that 
the pgf of the new distribution P(z) is written as 
P(z) = Pn[Pm(z)], 


(4.18) 


Distribution 


Po 







=Pr(iV = n)y^ Pr(Mi + • * • + M n = k\N = n)z k 

n=0 k=0 

= £pr(iV = n)[P M ( Z )r 

n=0 

= Pn[Pm(z)]- 


In insurance contexts, this distribution can arise naturally. If iV represents the 
number of accidents arising in a portfolio of risks and {M^; = 1 ， 2, •. • ， iV} 

represents the number of claims (injuries, number of cars ， etc.) from the 
accidents, then S represents the total number of claims from the portfolio. 


This kind of interpretation is not necessary to justify 
distribution. J£ a compound distribution fits data we 



justification itself. Also, there are other motivations for these distributions, 


as presented in Section 4.6.12. 


y. Example 4.46 Demonstrate that any zero-modified distribution is a com¬ 
pound distribution. 


Consider a primary Bernoulli distribution. It has pgf Pjv(^) = 1 — q -\r qz. 
Then consider an arbitrary secondary distribution with pgf ⑷ . Then, 
from (4.18) we obtain 


Ps(z) = Pn[Pm{z)\ = 1 - g + qPM{z)- 


From (4.10) this is the pgf of a ZM distribution with 


That is, the ZM distribution has assigned arbitrary probability p^ 1 at zero, 
wMe po is the probability assigned at zero by the secondary distribution. □ 


1-^ 

1 -Po _ 


distribution. Determine the pgf of this distribution. 

This distribution is called the Poisson - Poisson or Neyman Type A distri¬ 
bution. Let Pn{z) = e 入 “ 卜 1 ) and Pm{z) = e 入 2 ( 2 - Then 

P s (z) = e^ X ^~ 1) - 1 l 


When 、 is a lot larger than Ai — for example, Ai = 0.1 and A2 = 10~the 
resulting distribution will have two local modes. 口 
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The probability of exactly k claims can be written as 
Pr(5 = k) = ^2 Pr(5 = k\N = n) Pr(iV = n) 

n=0 

00 

=Pr(Mi + ••• + Mjv = k\N = n) Pi(N = n) 

71=0 

OO 

= Pr ( Ml + ■■■ + M n = k) Pi(N = n). (4.19) 

71=0 <• ' 

Letting g n = Pr(5 = n), p n = Pi(N = n), and f n = Pr(M = n), this is 
rewritten as 

卯 = f>/r (4-20) 

n=0 

where f: n , k = 0,1,..., is the “n-fold convolution” of the function 九， ft = 
0, 1， • • .， that is, the probability that the sum of n random variables which are 
each independent and identically distributed (i.i.d.) with probability function 
fk will take on value k. 


_ -^-1 


Theorem 4.48 If the primary distribution is a member of the (a, 6,0) class, 
the recursive formula is 

9k = 1 (a + 警 ) /jfffe-j, fc = 1 ， 2,3, …. （ 4_22) 

Proof: From (4.21), 

np n = a(n- l)p n 一 1 + (a + 6)jp n -i- 

Multiplying each side by [Pm^^P^z) and summing over n yields 
np n [PM{z)] n ~ l P f M (z) = a y^(n - l)Vn-i[P m{z)] u ~ 1 Pm {z) 

n—l 7i=l 

OO 

+(a + b)J2 Pu-iiPuiz^P'M- 
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Because P s (z) = E=oPn[j°M( 2 )]' the previous equation is 

p' s {z) =a'^2 np n [PM(z)] n P'M(. z ) + ( a + b ) Pn [ 〜 ⑷ ] U z )• 

n=0 71=0 <, 

Therefore …、 

Ps{z) = aP f s {z)PM(z) + (a + b)Ps(z)PMi z )^ 

Each side can be expanded in powers of 2 . The coefficients of z k ^ x in such 
an expansion must be the same on both sides of the equation. Hence, for 
fe = 1,2,... we have 

k k 

kg k = a L(fc - j)fj9k-j + (a + jfj9k-j 
i=o J=° 

k ^ 、 

= akfogk + a - j)fj9k-j + (a + 6) ^ ifj9k-j 
j=i i =1 

k k 

=akfogk + afcy^ fj9k-j + b 5Z 3fj9k-j- 
j=i i =1 

Therefore, ^ 

Qk = afo9k + $ ( a + 警 ) h9k-j- 

Rearrangement yields (4.22). 口 

In order to use (4.22), the starting value g 0 is required and is given in 
Theorem 4.51 below. If the primary distribution is a member of the (a, & ， 1) 
class, the proof must be modified to reflect the fact that the recursion for the 
primary distribution begins at fc = 2. The result is the following. 

乂 Theorem 4.49 If the primary distribution is a member of the (a, 6,1) class, 
the recursive formula is 

[pi~(Q + b)po\fk + T，U (a + bj/k) /ifffc-i , k = 1M ... . (4.23) 

Proof: It is similar to the proof of Theorem 4.48 and is left to the reader. □ 

Example 4.50 Develop the recursive formula for the case where the primary 
distribution is Poisson. 

In this case a = 0 and b = 入 ， yielding the recursive form 

A 、 

9k = T 
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The starting value is, from (4.18), 


9o = Pr(5 = 0) = P(0) 

= Pn [Pm (0)] = Pn (/o) 

= e -A(l-/o) 


(4.24) 


Distributions of this type are called compound Poisson distributions. When 
the secondary distribution is specified, the compound distribution is called 
Poisson -X， where X is the name of the secondary distribution. □ 

The method used to obtain go applies to any compound distribution. 

Theorem 4.51 For any compound distribution, qq = Pjv(/o)> where Pn[z) is 
the pgf of the primary distribution and is the probability that the secondary 
distribution takes on the value zero. 


Proof: See the second line of (4.24) 


□ 


乂广 Note that the secondary distribution is not required to be in any special 
form. However, to keep the number of distributions manageable, secondary 
distributions will be selected from the (a, 6 , 0 ) or the (a, 6 , 1 ) class. 

Example 4.52 Calculate the probabilities for the Poisson-ETNB distribution 
where A = 3 for the Poisson distribution and the ETNB distribution has 
r = 一 0.5 and /3 = 1. 

From Example 4.45 the secondary probabilities axe /o = 0, /i = 0.853553, 
/ 2 = 0.106694, and / 3 = 0.026674. Prom (4.24) ， 卯 =exp[-3(l - 0)]= 
0.049787. For the Poisson primary distribution, a = 0 and 6 = 3. The 
recursive formula in (4.22) becomes 


i _ o( o ) — 




9k-j- 


Then, 


3(1) 

1 

3(1) 


0.853553(0.049787) = 0.127488, 
3(2), 


2 0.853553(0.127488) + f0.106694(0.049787) 
.m. ^0.853553(0.179163) + ^0.106694(0.127488) 


0.179163, 




0.026674(0.049787) = 0.184114. 
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Proof: 


DISCRETE DISTRIBUTIONS 95 


Example 4.53 Demonstrate that the Poisson-logarithmic distribution is a 
negative binomial distribution. 

The negative binomial distribution has pgf 

P(z) = [l-0{z-l)]- r . 

Suppose P N {z) is Poisson ( 入 ） and P M {z) is logarithmic(/3); then 
Pn[Pm(z)] = exp{X[P M (z) - 1]} 

= ^in 1 —— ww) ——1/ 

= [l-P{z-l)]- X/ln{1+0) 

=[l-p{z-l)]~ T , 

where r = X/ ln(l+/3). This shows that the negative binomial distribution can. 
be written as a compound Poisson distribution with a logarithmic secondary 
distribution. . ^ 

The above example shows that the “Poisson-logarithmic” distribution does 
not create a new distribution beyond the (a, 6,0) and (a, 6,1) classes. As a 




distributions are geometric. The resulting distribution is a> zero-modified geo¬ 
metric distribution, as shown in Exercise 4.51. The following theorem shows 
that certain other combinations are also of no use in expanding the class of dis¬ 
tributions through compounding. Suppose Ps(z) = Pn[Pm(z) ；0 ] as before. 
Now, Pm(z) can always be written as 

PM{z) = fo + (l-fo)P* M (z) (4.25); 

where Pm{z) is the pgf of the conditional distribution over the positive range 
(in other words, the zero-truncated version). 

Theorem 4.54 Suppose the pgf Pn{z] 0) satisfies 
Pn(z ； 0) = B[0(z^1)] 

for some parameter 9 and some function B(z) which is independent of 6. That 
is, the parameter 6 and the argument z only appear in the pgf as 6(z—l). There 


P S (Z) = PN[PM(z) ； e] 

=PN[fo + {l-fo)PL(^0] 

= B{e[f 0 + (i-fo)Pl(z)-i}} 

=B{0(l-/o)[P^)-l]} 

=Pn[PL(z) ； 0(1-/o)1. □ 

This shows that adding, deleting, or modifying the probability at zero in the 
secondary distribution does not add a new distribution because it is equivalent 
to modifying the parameter 0 of the primary distribution. This means that, 
for example ， a» Poisson primary distribution with a Poisson, zero-truncated 
Poisson, or zero-modified Poisson secondary distribution will still lead to a 
Neyman Type A (Poisson—Poisson) distribution. 

Example 4.55 Determine the probabilities for a Poisson-zero-modified ETNB 
distribution where the parameters are X = 7.5. Pq 1 = 0.6, r = —0.5, and j3 = 1. 

From Example 4.45 the secondary probabilities are/o = 0.6, f± = 0.341421, 
/2 = 0.042678, and fs = 0.010670. R:om (4.24), go = exp[—7.5(1 — 0.6)]= 
0.049787. For the Poisson primary distribution, a = 0 and b = 7.5. The 
recursive formula in (4.22) becomes 


Ej=i(7-5j/fe)/i3fc- 


E 竽 /* 


gi = ^^0.341421(0.049787) = 0.127487, 


52 = ^^0.341421(0.127487) 


52 = ^^0.341421(0.127487) + ^^0.042678(0.049787) = 0.179161, 

g 3 = ^^0.341421(0.179161) + ^^0.042678(0.127487) 

+1®0.010670(0.049787) = 0.184112. 

Except for slight rounding differences, these probabilities are the same as those 
obtained in Example 4.52. □ 








are fixed. From (4.26) (see Exercise 4.53) and Definition 3.4, the mean an 
second and third central moments of the compound Poisson distribution are 




this model arises trom tue ract tnaL luc x uxo 叫 u - 二 f 

model to describe the number of claim-causing accidents, and the number of 
claims from an accident is often itself a random variable. This 
vation is discussed in the previous subsection. However, there 
convenient mathematical properties enjoyed by the 

In particular, those involving recursive evaluation of the probability were 
^so discussed in the previous subsection. In addition, there is a close con¬ 
nection between the compound Poisson distributions a^d the Poisso ^ 

freauencv distributions which is discussed in more detail m Section 4.6.10. 
leiwe consider some other properties of these distributions. The compound 
Poisson pgf maybe expressed as 


P(z) = exp{A[Q ⑷一 1 ]}， 


Q(z) is the pgf of the secondary distribution. 




The ETNB distribution has pgf 




for ^ > 0, r > -1, and r ^ 0. Then 
the logarithm of its pgf 


the Poisson-ETNB distribution has as 




[i-mr 


.-(l + /?)- r 


fi{[l - P{z - l)] _r - 1}> 




fjfi = /x = 

fJ>2 = 0*2 = X7TI2 

M 3 = 


where m/j is the jth raw moment of the secondary distribution. The coefficient 


_ M3 = 饥 ’3 


7l = ^ = a 1/2 k) 3 / 2 . 


For the Poisson-binomial distribution, with a bit of algebra (see Exercise 4.54) 
we obtain 


pi = Xmq, 

(j 2 = /x[l + (m-l)g], 


= 3cr 2 — 2fj, + 


Carrying out similar exercises for the negative binomial, Polya-Aeppli, 
man Type A, and Poisson-ETNB distributions yields 


Negative binomial: = 3cr — 2/^ + 2- 


Polya-Aeppli: " 3 = 3cr 2 — 2" + | 

(o* 2 — u) 2 

、一 nx —a A . O .. I v . LIZ . — 


Neyman Type A: fi 3 = 3cr — 2〆 • 


Poisson-ETNB : 


3o*2 — 2 /i + 


. + 2(a 2 -M) 2 


For the Poisson-ETNB distribution, the range of r is -1 < r < oo, r 7 ^ 0. 
Note that as r — 0 the secondary distribution is logarithmic，resulting in the 
negative binomial distribution. 

Note that for fixed mean and variance the third moment only changes 
through the coefficient in the last term for each of the five distributions. For 
the Poisson distribution, /x 3 = A = 3cr 2 — 2/x, and so the * 1 

expression for fjb s represents the change from the Poisson d 
Poisson-binomial distribution, if m = 1, the distribution 
it is equivalent to a Poisson — zero-truncated binomial as 
leaves only probability at 1. Another view is that from (^1 




We can compare the skewness (thirc 
develop an appreciation of the amount 
thp ta.ils of these distributions, can var 
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which reduces to the Poisson value for /x 3 when m = 1. Hence, it is necessary 
that m > 2 for non-Poisson distributions to be created. Then the coefficient 
satisfies 0 


For the Poisson-ETNB，because r > -1, the coefficient satisfies 


noting that when r = 0 this refers to the negative binomial distribution. For 
the Neyman Type A distribution, the coefficient is exactly 1. Hence, these 
three distributions provide any desired degree of skewness greater than that 




give the distribution of the number of claims on automobile insurance policies 
in Australia. Determine an appropriate frequency model based on the skewness 
results of this section. 


The mean, variance, and third cent] 
and 0.1401737, respectively. For these : 


0.1254614, 0.1299599, 


(a2-M) 2 /M -——一 

From among the Poisson-binomial, negative binomial, Polya-Aeppli, Neyman 
Type A, and Poisson - ETNB distributions, only the latter is appropriate. For 
this distribution, an estimate of r can be obtained from 


resulting in r = - 0.8471851. In Example 13.14 a more formal estimation and 
selection procedure will be applied, but the conclusion will be the same. □ 

A very useful property of the compound Poisson class of probability distri¬ 
butions is the fact that it is closed under convolution. We have the following 
theorem. 

Theorem 4.58 Suppose that Si has a compound Poisson distribution with 
Poisson parameter Xi and secondary distribution {Qn(^)j n — 0,1 ， 2, • • •} for 
i = l ， 2,3,..”fc. Suppose also that …, S k are independent random 

vo/riables. Then S = 5i + S 2 + •. • + *5fc also has a compound Poisson distribu¬ 
tion with Poisson parameter A = Ai + 入 2 + •. • + 入 fc and secondary distribution 
{q n \ n = 0,1,2,...}, where q n = [ 入 i9n ⑴ + 入 2 扣 ⑶ + … + XkQn{k)]/X. 
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Table 4.5 Hossack et al. data 


No. of claims 

Observed frequency 

0 

565,664 

1 

68,714 

2 

5,177 

3 

365 

4 

24 

5 

6 

6 + 

0 


Proof: Let Qi{z) = Y^oQn(i)z n for i = 1 ， 2,…， &• Then 5^ has pgf 
P Si (z) = E ( 之 = exp{Xi[Qi(z) - 1]}. Because the S{S axe independent, 
S has pgf 

p s{z) = n p Si(z) 

i=l 

=FI exp{Ai[Qi(z) - 1]} 
i=l 

= expfEA^^-E^l 

Li=l i=l J 

=exp{A[Q(x) - 1]} 

where A = I ^ =1 Ai and Q(z) = AiQi(z)/A. The result follows by the 
uniqueness of the generating function. □ 


One main advantage of this result is computational. If we are interested 


iM 誦 n ■ 腿 g 函 mr 誦 : h 


e separately (i.e., recursively using Example 4.50) because Theorem 4.58 
plies that a single application of the compound Poisson recursive formula 
Example 4.50 will suffice. The following example illustrates this idea. 

: ample 4.59 Suppose that fe = 2 and Si has a compound Poisson distri- 
lion with Ai = 2 and secondary distribution qi{l) — 0.2, 仍 ⑴ = 0.7，and 
1 ) = 0.1. Also, S 2 (independent of S±) has a compound Poisson distrib- 
on with \2 = 3 and secondary distribution q 2 ( 2 ) = 0.25, 必 ⑶ = 0 . 6 ， anfl 
9.) — 0*15. Determine the distribution of 5 = 5! + S 2 . 







and secondary distribution q± = 0.08, ^2 ― 0.43 ,53 = 0.40, and = 0.09. 
Numerical values of the distribution of S may be obtained using the recursive 
formula x 

Pr(S = a;) = ^； nq n Pr(5 = x —n), x = 1,2,..., 

beginning with Pr(5 = 0) = e~ 5 . 口 


In various situations the convolution of negative binomial distributions 
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The distribution of N may be computed recursively using the formula 
A n 

Pv(N = n) = kqk Pr(N = n —fe), n = 1 ， 2 , …， 

71 fc=i 


beginning with Pr(JV = 0) = e_ A = + A)— ri and with A and q n as 



It is not hard to see that Theorem 458 is a generalization of Theorem 
4.37, which may be recovered with gi(i) = 1 for z = 1 , 2 ,..., fc. Similarly, 
the decomposition result of Theorem 4.38 may also be extended to compound 
Poisson random variables, where the decomposition is on the basis of the 
region of support of the secondary distribution. See Panjer and Willmot 
[106], Sec. 6.4 or Karlin and Taylor [72], Sec. 16.9 for further details. 

4.6,9 Mixed frequency models 

Many compound distributions can arise in a way that is very different from 
compounding. In this section, we examine mixture distributions by treating 
one or more parameters as being “random” in some sense. This section ex¬ 
pands on the ideas discussed in Section 4.6.3 in connection with the gamma 
mixture of the Poisson distribution being negative binomial. 

We assume that the parameter is distributed over the population under 
consideration (the collective) and that the sampling scheme that generates 
our data has two stages. First, a value of the parameter is selected. Then, 
given that parameter value, an observation is generated using that parameter 
value. 

.In automobile insurance, for example, classification schemes attempt to put 
individuals into (relatively) homogeneous groups for the purpose of pricing. 
Variables used to develop the classification scheme might include age, expe¬ 
rience, a history of violations, accident history, and other variables. Because 
there will always be some residual variation in accident risk within each class, 

mixed distributions provide a framework for modeling this heterogeneity. 

Let P{z\0) denote the pgf of the number of events (e.g” claims) if the risk 
parameter is known to be 0. The parameter, 0, might be the Poisson mean, 
for example, in which case the measurement of risk is the expected number 
of events in a fixed time period. 

Let U{6) = Pr(© < 6) be the cdf of 0, where © is the risk parameter, 
which is viewed as a random variable. Then TJ{6) represents the probability 
that，when a value of 0 is selected (e.g.，a driver is included in our sample), 
the value of the risk parameter does not exceed 6. Let n(0) be the pf or pdf 
of 0. Then 


P{z\9)u(e)de or P(z) = P{z\^iXQj) (4.29) 
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is the unconditional pgf of the number of events (where the formula selected 
depends on whether 0 is discrete or continuous 5 ). The corresponding proba¬ 
bilities are denoted by 

J Pk{0)u{9)d8 or p k = (〜 M 心 ). （ 4.30) 

The mixing distribution denoted by U(6) may be of the discrete or contin¬ 
uous type or even a combination of discrete and continuous types. Discrete 
mixtures axe mixtures of distributions when the mixing function is of the 
discrete type. Similarly for continuous mixtures. This phenomenon was 
introduced for continuous mixtures of severity distributions in Section 4.4.5 
and for finite discrete mixtures in Section 4.2.3. 

It should be noted that the mixing distribution is unobservable because the 
data are drawn from the mixed distribution. 

Example 4.61 Demonstrate that the zero-modified distributions may be cre¬ 
ated by using a two-point mixture. 

Suppose 

p ⑻呼 1 + (1-，)_)• 

This is a (discrete) two-point mixture of a degenerate distribution that places 
all probability at zero and a distribution with pgf ft ⑷. From (4.25) this is 
also a compound Bernoulli distribution. 口 

Many mixed models can be constructed beginning with a simple distribu- 
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Then the mixed distribution has probabilities 


=ro 




r(q-f b)T{m + jQT(a + k)T(b + m-k) 
T(a)T(b)T(k + l)r(m — k+ l)r(a + b + m) 


-a\r -b 、 

k ) \m — kj 


Example 4.63 Determine the pf for a mixed negative binomial distribution 
with mixing on the parameter p = (1 + /3) 一 1 • Let p have a beta distribution. 




T(r + k) T(a + b) T(a + r)T(b + fc) 

r(r)r(fc + 1 ) r(a)r(6) r(a + r + 6 + fc) 5 ’ 


When 6—1, this distribution is called the Waring distribution. When r = 
6 = 1, it is termed the Yule distribution. □ 


4.6.10 Poisson mixtures 


If we let Pk(0) in (4.30) have the Poisson distribution, this leads to a class of 
distributions with useful properties. A simple example of a Poisson mixture 
is the two-point mixture. 


Example 4.64 Suppose drivers can be classified as “good drivers” and i( bad 
drivers,” each group with its own Poisson distribution. Determine the pf for 
this model and fit it to the data from Example 12.56. This model and its 
application to the data set are from Trobliger [130]. 


Prom (4.30) the pf is 


The maximum likelihood estimates 6 were calculated by TrObliger to be 
p = 0.94, Ai = 0.11 and A 2 = 0.70. This means that about 94% of drivers 
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were “good” with a risk of = 0.11 expected accidents per year and 6% ; 
were “bad” with a risk of A 2 = 0.70 expected accidents per year. Note that 
it is not possible to return to the data set and identify which were tte bad 
drivers. 

This example illustrates two important points about finite mixtures. First, 
tlie model is probably oversimplified in the sense that risks (e.g. ， drivers) 
probably exhibit a continuum of risk levels rather than just two. The second , 
point is that finite mixture models have a lot of parameters to be estimated. 
The simple two-point Poisson mixture above has three parameters. Increasing 
the number of distributions in the mixture to r wiU then involve r -1 mbdng.:: 
parameters in addition to the total number of parameters in the r component 
distributions. As a result of this, continuous mixtures are frequently preferred. 

The class of mixed Poisson distributions has some interesting properties 
that will be developed here. . 

Let P(z) be the pgf of a mixed Poisson distribution with arbitrary moang:..: 
distribution U(6). Then (with formulas given for the continuous case), by 
introducing a scale parameter A, we have 


P ㈤ = 


• x0 ^-^u{6)de^= J u{d)de 

^ [e A(z - 0 | = M e [A( 2： 


-1)]， 


(4.31) 


where M 0 (z) is the mgf of the mixing distribution. 

Therefore, P'(z) = 入 M “ [入 (z-1)] and with z = lwe obtain E(AT) = AE(0), 
where N has the mixed Poisson, distribution. Also, P"(z) = X~Mq[X{z - 1)J, 
implying that E[iV(JV - 1)] = A 2 E(0 2 ) and therefore 

Vax(iV) = E[JV(JV-l)] + E(iV)-[E(AO] 2 
=A 2 E(0 2 ) + E(iV) - A 2 [E(©)] 2 
=A 2 Var(0)+E(iV) 

> E(JV) 

and thus for mixed Poisson distributions the variance is always greater thai£| 

the mean. ... , • • 

Douglas [29] proves that for any mixed Poisson distribution the mixing；; 
distribution is unique. This means that two different mixing distributions| 
cannot lead to the same mixed Poisson distribution. This allows us to identify;^ 
the mixing distribution in some cases. • • 

There is also an important connection between mixed Poisson distributions j 
and compound Poisson distributions. 

Definition 4.65 A distribution is said to be infinitely divisible if for all 
values 0/71 = 1,2,3,... its characteristic function tp(z) can be written as 

^p{z) = [(p n (z)] n , 


where ip n [z) is the characteristic function of some random variable. 

In other words, taking the (l/n)th power of the characteristic function still 
results in a characteristic function. The characteristic function is defined as 
follows. 

Definition 4.66 The characteristic function of a random variable X is 
(fix(z) = E(e izX ) = E(cos zX + isinzX), 

where i = y/—l. 

In Definition 4.65, “characteristic function” could have been replaced by 
“moment generating function” or “probability generating function,” or some 
other transform. That is, if the definition is satisfied for one of these trans¬ 
forms, it will be satisfied for all others which exist for the particular random 
variable. We choose the characteristic function because it exists for all dis¬ 
tributions while the moment generating function does not exist for some dis¬ 
tributions with heavy tails. Because many earlier results involved probability 
generating functions, it is useful to note the relationship between it and the 
characteristic function. 

Theorem 4.6T If the probability generating function exists for a random 
variable X } then Px{z) = p(—ilnz) and (Px( z ) = P(g iz ). 


Proof: 


and 


P x {z) = E{z x ) = E{e xlaz ) = E[e-^ Unz ^ x } = <p x {-i]iLz) 


<Px(z) = E{e izX ) = E[{e iz ) x \ = Px(e iz )- 


The following distributions, among others, are infinitely divisible: normal, 
gamma, Poisson, negative binomial. The binomial distribution is not infi¬ 
nitely divisible because the exponent m in its pgf must take on integer values. 
Dividing m by n = 1,2,3,... will result in nonintegral values. In fact, no 
distributions with a finite range of support (the range over which positive 
)can be i 


probability exist) 


infinitely divisible. Now to the important result. 


Theorem 4.68 Suppose P(z) is a mixed Poisson pgf with an infinitely divis¬ 
ible mixing distribution. Then P(z) is also a compound Poisson pgf and may 
be expressed as 

P{z) = e^W-1], 

where P 2 (z) is a pgf. If one adopts the convention that _P 2 (0) = 0, then 巧⑷ 
is unique. 
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A proof can be found in Feller [35], Ch. 12. If one chooses any infinitely 
divisible mixing distribution, the corresponding mixed Poisson distribution 
can be equivalently described as a compound Poisson distribution. For some 
distributions, this is a distinct advantage when carrying out numerical work 
because the recursive formula (4.22) can be used in evaluating the probabilities 
once the secondary distribution is identified. For most cases, this identification, 
is easily carried out. A second advantage is that, because the same distribution 
can be motivated in two different ways, a specific explanation is not required 
in order to use it. Conversely, the fact that one of these models fits well does 

' . .he result of mixing or compounding. For example, the 

v a negative binomial distribution does not imply that 
Poisson distribution and the Poisson parameter has a 


mixture oj rotsson vanaoies is ntyauvc 


I£ the mixing distribution is gamma, it has the j 
function (as derived in Example 3.15 and where 


/? > 0, a >0 r t< j9. 


It is clearly infinitely divisible because [M e {t)] 1/n is the mgf of a gamma 
distribution with parameters a/n and /3. Then the pgf of the mixed Poisson ： 
distribution is 


which is the form of the pgf of the negative binomial distribution where the 
negative binomial parameter r is equal to a and the parameter 0 is equal to | 
X/p. 口 


It was shown in Example 4.53 that a* compound Poisson distribution with a 
logarithmic secondary distribution is a negative binomial distribution. There¬ 
fore the theorem holds true for this case. It is not difficult to see that, if u(6) 
is the pf for any discrete random variable with pgf Pq{z), then the pgf of the 
mixed Poisson distribution is P 0 [e^- 1 )]，a compound distribution with a 
Poisson secondary distribution. 

Example 4.70 Demonstrate that the Neyman Type A distribution can be 
obtained by mixing. 

If in (4.31) the mixing distribution has pgf 
P e (z) = e^- 1 \ 


then the mixed Poisson distribution has pgf 

P(z) = exp{ M [e A ( a - 1 )-l]} ) 

the pgf of a compound Poisson with a Poisson secondary distribution, 
the Neyman Type A distribution. 


A further interesting result obtained by Holgate [60] is that, if a 
distribution is absolutely continuous and unimodal, then the resulting 
Poisson distribution is also unimodal. Multimodality can occur when c 


• The reader should try this calculation for various 
parameters. 

butions in this book involve a scale parameter. This 


Example 4.71 Show that a mixed Poisson with an inverse Gaussian mixing 
distribution is the same as a Poisson-ETNB distribution with r = —0.5. 


The inverse Gaussian distribution is described in Appendix A. It has pdf 


/(x) = ( 


which is conveniently rewritten as 

where /3 = fi 2 /0 . The mgf of this distribution is (see Exercise 3.24) 

M{t) = exp |-^[(1- 2/3t) 1/2 -1]|. 

Hence, the inverse Gaussian distribution is infinitely divisible ([M(t)] l ^ n is 
the mgf of an inverse Gaussian distribution with fi replaced by fi/n). From 
(4.31) with 入 = 1， the pgf of the mixed distribution is 


P(z)=exp ( 一含 {[1 + 2 列 1 - z)] 1/2 -l}^ 










Table 4.6 Pairs of compound and mixed Poisson distributions 



Negative binomial logarithmic gamma 

Neyman-A Poisson Poisson 

Poisson-inverse Gaussian ETNB (r = —0.5) inverse ( 






tion with r = — 查 . 

Hence, the Poisson—inverse Gaussian distribution is a compound Poisson 
distribution with an ETNB (r = —|) secondary distribution. □ 

The relationships between mixed and compound Poisson distributions are 
given in Table 4.6. 

In this chapter, we focused on distributions that are easily handled com¬ 
putationally. Although many other discrete distributions are available, we 
believe that those discussed form a> sufficiently rich class for most problems. 

4.6.11 Effect of exposure on frequency 

Assume that the current portfolio consists of n entities, each of which could 
produce claims. Let Nj be the number of claims produced by the jth. entity. 
Then JV = iVi + * •• + N n . If we assume that the Nj are independent and 
identically distributed, then 

P N (z) = [P N ^)] n - 

Now suppose the portfolio is expected to expand to n* entities with fre¬ 
quency JV*. Then 

Pn-(z) = [Pat,^)] 71 ' = [P N (z)] n ' /n . 

Thus, if N is infinitely divisible, the distribution of N* will have the same 
form as that of AT, but with modified parameters. 

Example 4.72 It has been determined from past studies that the number of 
workers compensation claims for a group of 300 employees in a certain occu¬ 
pation class has the negative binomial distribution with P = 0.3 and r = 10. 
Determine the frequency distribution for a group of 500 such individuals. 


of N* is 
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V-(z) = [P N (z)] a00/300 = {[1 - 0.3(x - 1)] _10 } 500/300 
=[1 - 0.3(z - I)]— 16 . 67 , 



place large amounts of probability at zero. 

又 4.6.12 An inventory of discrete distributions 

We have introduced the simple (a, 6,0) class, generalized to the (a, 6,1) class, 
and then used compounding and mixing to create a larger class of distribu- 
tioDS. Calculation of the probabilities of these distributions can be carried 
out by using simple recursive procedures. In this section we note that there 
axe relationships among the various distributions similar to those of Section 
4.5.2. The specific relationships are given in Table 4.7. 

It is clear from earlier developments that members of the (a，&, 0) class 
are special cases of members of the (a, 1) class and that zero-truncated 

distributions are special cases of zero-modified distributions. The limiting 
cases are best discovered through the probability generating function, as was 
done on page 79 where the Poisson distribution is shown to be a limiting case 
of the negative binomial distribution. 

We have not listed compound distributions where the primary distribution 
: is one of the two parameter models such as the negative binomial or Poisson- 
:;; inverse Gaussian. This was done because these distributions are often them¬ 
selves compound Poisson distributions and, as such, are generalizations of 
% distributions already presented. This collection forms a paxticularly rich set 
S of distributions in terms of shape. However, many other distributions axe 
also possible. Many others are discussed in Johnson, Kotz, and Kemp [69], 
i Douglas [29], and Panjer and Willmot [106]. 

4.6.13 Exercises 

4.40 For each of the data sets in Exercises 12.96 and 12.98 on page 400 
I calculate values similar to those in Table 4.2. For each, determine the most 
|i appropriate model from the (a, 6,0) class. 
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Table 4.7 Relationships among discrete distributions 


Distribution 

Is a special case of 

Is a limiting case of 

Poisson 

ZM Poisson 

negative binomial 


Poisson - binomial 
Poisson - inv. Gaussian 
Polya - Aeppli a 
Neyman-A 6 


ZT Poisson 

ZM Poisson 

ZT negative binomial 

ZM Poisson 


ZM negative binomial 

geometric 

negative binomial, 

ZM geometric 

geometric-Poisson 

ZT geometric 

ZT negative binomial 


ZM geometric 

ZM negative binomial 


logarithmic 


ZT negative binomial 

ZM logarithmic 


ZM negative binomial 

binomial 

ZM binomial 


negative binomial 

ZM negative binomial, 
Poisson-ETNB 


Poisson—inverse Gaussian 

Poisson-ETNB 


Polya-Aeppli 

Poisson-ETNB 


Neyman—A 


Poisson-ETNB 


a Also called Poisson-geometric. 
b Also called Poisson-Poissoa. 


4.41 Calculate Pr(iV = 0), Pr(iV = 1), and Pr(iV = 2) for each of the 
following distributions. 

(a) Poisson(A = 4) 

(b) Geometric(i9 = 4) 

(c) Negative bmoinial(r = 2,/3 — 2) 

(d) Binomial(m = 8,g = 0.5) 

(e) Logaritlimic ( 冷 = 4) 

(f) ETNB(r = -0.5,^ = 4) 

(g) Poisson-inverse Gaussian ( 入 = 2, 冷 = 4) 

(h) Zero-modified geometric^ 1 = 0.5 ,0 = 4) 

(i) Poisson - Poisson(Neyman Type A) (Xprimary = 4, Xsecondary = 1) 

(j) Poisson — ETNB ( 入 = 4, r = 2, 卢 = 0.5) 

(k) Poisson-zero-modified geometric distribution 。 = 8,Pq ! = 0.5, r = 
2, 0 = 0.5) 


4.42 The moment generating function (mgf) for discrete variables is 

defined as ^ 

M N (z) = E{e zN ) = J2Pke zk - 

fc =0 

Demonstrate that jP N (z) = Mjv(I n z). Use the fact that E^Y^) = M^\o) to 
shbw that P f (l) = E(N) and P n {l) = E[JV(iV — 1)1. 

4.43 Use your knowledge of the permissible ranges for the parameters of the 
Poisson, negative binomial, and binomial to determine all possible values of 
a and b for these members of the (a, 6,0) class. Because these are the only 

members of the class, all other pairs must not lead to a legitimate probability 

distribution (nonnegative values that sum to 1). Show that the pair a = -1 
and b — 1.5 (which is not on the list of possible values) does not lead to a 
legitimate distribution. 

4.44 Show that for the negative binomial distribution with any (3 > 0 and 
r > —1，but r ^ 0, the successive values of pk given by (4.15) are, for any pi, 
positive and YlkLiP^ < °°* 

4.45 Show that when, in the zero-truncated negative binomial distribution, 
r —> 0 the pf is as given in (4.16). 

4.46 Show that the pgf of the logarithmic distribution is as given in (4.17). 

4.47 Show that for the Sibuya distribution, whicli is the ETNB distribution 
with —1 < r < 0 and ^ oo, the mean does not exist (that is，the sum which 
defines the mean does not converge). Because this random variable takes on 
nonnegative values, this also shows that no other positive moments exist. 

4.48 A frequency model that has not been mentioned to this point is the zeta 
distribution. It is a zero-truncated distribution with = fe^ p+1 VC(P + 

i 1)，fe = 1 ， 2, ...，/?> 0. The denominator is the zeta function, which must be 
evaluated numerically as ((/? + 1) = 釦一以十 1 ). The zero-modified zeta 

distribution can be formed in the usual way. More information can be found 
K in Luong and Doray [88]. Verify that the zeta distribution is not a member 
; of the (a, 6,1) class. 

4.49 Do all the members of the (a, 6,0) class satisfy the condition of Theorem 
4.54? For those that do, identify the parameter (or function of its parameters) 

聲 that plays the role of 0 in the theorem. 

4.50 For i = 1， ... ， n let 5i have independent compound Poisson frequency 

distributions with Poisson parameter Xi and a secondary distribution with pgf 
P 2 (z). Note that all n of the variables have the same secondary distribution. 
Determine the distribution of 5 = 5i H - 1- S n . 
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4.51 Show that the following three distributions are identical: (1) geometric- 
geometric, (2) Bernoulli-geometric, (3) zero-modified geometric. That is, for 
any one of the distributions with arbitrary parameters, show that there is a 
member of the other two distribution types that has the same pf or pgf. 

4.52 Show that the binomial-geometric and negative binomial-geometric (with 
negative binomial parameter r a positive integer) distributions axe identical. 

4.53 Show that, for any pgf, P ⑻ (1) = E[N(N -1) ••• (iV-fc +1)] provided 
the expectation exists. Here P ⑻⑷ indicates the feth derivative. Use this 
result to confirm the three moments as given in (4.27). 

4.54 Verify the three moments as given in (4.28). 

4.55 Show that the negative binomial - Poisson compound distribution is the 
same as a mixed Poisson distribution with a negative binomial mixing distri¬ 
bution. 

4.56 For i = 1 ,..., n let iVi have a mixed Poisson distribution with parameter 
a! Let the mixing distribution for Ni have pgf Pi{z ) ： Show that N = N ± -{- 
..has a mixed Poisson distribution and determine the pgf of the mixing 
distribution. 

4.57 Let N have a Poisson distribution with (given that 0 = 0) parameter 
X6. Let the distribution of the random variable 0 have a scale parameter. 
Show that the mixed distribution does not depend on the value of A. 

4.58 Let N have a Poisson distribution with (given that © = 0) parameter 
0. Let the distribution of the random variable 0 have pdf u(0) = a 2 (a + 
1 )— 1(0 + i)e -a0 , 0 > 0. Determine the pf of the mixed distribution. Also, 
show that the mixed distribution is also a compound distribution. 

4.59 For the discrete counting random variable N with probabilities p n = 

p r (iV = n ); n = 0,1,2,..., let a n = Pr(iV > n) = 外； n = 

0 , 1 , 2 ,.... 

(a) Demonstrate that E(N) = a ^* 

(b) Demonstrate that A(z) = a n z n and P( 2 ) = are 

related by A{z) = [1 一 P{z)]!{l-z). What happens as 2 — 1? 

(c) Suppose that N has the negative binomial distribution 

〜 = (n + :-l)(^) ( 為 ) ^ = 0,1,2,..., 

where r is a positive integer. Prove that 


4 ( 




0 (rb) [ih] ， n ”， 


(d) Suppose that N has the Sibuya distribution with pgf P(z) = 1 一 
(1 一 z)- r , - 1 < r < 0. Prove that 

(一 r)r(n + r) — 100 
Pn= n!r(l + r) ’ n = 1 ’ 2 ’ 3 ，一 ’ 


and that 


a n == ( 几 = 0n = 0, 1 ， 2, • 


(e) Suppose that N has the mixed Poisson distribution with 
f°° (A<9) n e- 


n! 


-dU(9) : n = 0,1 ， 2 ,…， 


where U(0) is a cumulative distribution function. Prove that 
广 00 (X6) n e- xe 


= A 


n\ 


[1 - U{6)) d9, n = 0 ， l ， 2，. 


4.60 Consider the mixed Poisson distribution 

rl (\e) n e- xd 


Pn = Pr(JV = n) = 


u\e)de, n = o,i, 


where C/(0) = 1-(1-0) fc 5 O<0<l，fc = l ， 2, 


(a) Prove that 


,A m+n (rn + fe-l)! 一 _ n ， 
Pn = ke ml(m + k + ny. 1 n_ ° 5 


(b) Using Exercise 4.59 prove that 


A m+n+1 (m + fc)! 

Pr(iV >n) = e m !( m + fc + n + 1)!' 


(c) When fe = 1， prove that 
Pn 




4.61 Consider the mixed Poisson distribution 
_ f°° {X6) n e 一 


u(0)d6, n = 0,1， ... 
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where the pdf u(9) is that of the positive stable distribution (see. for example, 
Feller [36], pp. 448, 583) given by 

u(9) = r(fc fc, + 1 -' (-l)^ 1 ^ 0 - 1 sin(fcmr), 0 > 0, 
fc=l * 

where 0 < a < 1. The Laplace transform is / 0 °° e~ sd u(0)d8 = exp(—5 a ), s > 
0. Prove that {p n \ n = 0,1， •..} is a compound Poisson distribution with 
Sibuya secondary distribution (this mixed Poisson distribution is sometimes 
called a discrete stable distribution). 

4.62 Consider a mixed Poisson distribution with a reciprocal inverse Gaussian 
distribution as the mixing distribution. 

(a) Use Exercise 4.36 to show that this distribution is the convolution 

of a negative binomial distribution and a Poisson-ETNB distribu¬ 
tion with r = (i.e., a Poisson - inverse Gaussian distribution). 

(b) Show that the mixed Poisson distribution in (a) is a compound 
Poisson distribution and identify the secondary distribution. 



Frequency and severity 
with coverage 
modifications 


5.1 INTRODUCTIOiS! 


We have seen a variety of examples that involve functions of random variables. 
In this chapter we relate those fimctions to insurance applications. Through¬ 
out this chapter we assume that all random variables have support on all or 
a subset of the nonnegative real numbers. At times in this chapter and later 
in the text we will need to distinguish between a random variable that mea¬ 
sures the payment per loss (so zero is a possibility, taking place when there 
is a loss without a payment) and a variable that measures the payment per 

payment (the random variable is not defined when there is no payment). For 

notation, a per-loss variable will be denoted Y L and a per-payment variable 
will be noted Y p . When the distinction is not material (for example, setting 
a maximum payment does not create a difference), the superscript will be left 
off. 


5.2 DEDUCTIBLES 

Insurance policies are often sold with a per-loss deductible of d. When the 
loss, x, is at or below d, the insurance pays nothing. When the loss is above 


Loss Models: From Data to Decisions, Second Edition. 

By Stuart A. Klugman, Harry H. Panjer, and Gordon E. Willmot 
ISBN 0-471-21577-5 Copyright © 2004 John Wiley & Sons, Inc. 
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he insurance pays x — d. In the language of Chapter 3 such a deductible 
be defined as follows. 

finition 5.1 An ordinary deductible modifies a random variable into 
either the excess loss or left censored and shifted variable'see Definition 3.6). 
The difference depends on whether the result of applying the deductible is to 
be per payment or per loss ， respectively. 


This concept has already been introduced along with formulas for deter¬ 
mining its moments. The per-payment variable is 

一 J undefined, X <d, 

Y = \ X-d, X>d : 

while the per-loss variable is 


Note that the per-payment variable Y p = Y L \Y L > 0. That is, the per- 
payment variable is the per-loss variable conditioned on the loss being positive. 
For the excess loss/per-payment variable, the density function is 

noting that for a discrete distribution the density function need only be re¬ 
placed by the probability function. Other key functions axe 

n s S x {y + d) 

SMy) = 

hYp{y) = s X x { (y+ d d) =hx{y+d) ' 


Note that as a per-payment variable the excess loss variable places no proba¬ 
bility at 0. 

The left censored and shifted variable has discrete probability at zero of 
Fx[d) representing the probability that a payment of zero is made because 
the loss did not exceed d. Above zero, the density function is 


fy^(y) = fx(y + d), y >Q, 


(5.2) 
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while the other key functions are 1 (for y>0) 

Sy l {y) ― s x(y + d), 

FyL (y) = Fx(y + d). 

It is important to recognize that when counting claims on a per payment, 


made (while the frequency of losses will be unchanged). The nature of these 
changes will be discussed in Section 5.6. 

Example 5.2 Determine similar quantities for a Pareto distribution with 
a = 3 and 6 = 2000 for an ordinary deductible of 500. 

Using the above formulas, for the excess loss variable, 

3(2,000) 3 (2,000 + y + 500)- 4 3(2,500) 3 

fYP 、 V ) = (2,000)3(2,000 + 500)- 3 _ (2,500 + y) 4 ’ 

„ , 、 ( 2,500 、 3 

SyP ^ (2,500+ ， 

( 2,500、 3 

Fyp{v) = 卜， 

峠教， 2 ^ y - 

Note that this is a Pareto distribution with o ； = 3 and 0 = 2,500. For the left 
censored and shifted variable, 

f 0.488, 2 / = 0, {. 0.512, y = 0, 

= \ 3(2,OOP) 3 0 S Y L(y) = < y > o, 

i (2,500 + 2 /) 4 ， 没 ’ l (2,500+ 2/) 3 

{ 0.488, 2 / = 0, ( undefined, y = 0, 

! ( 2 ，_) 3 v>0 h Y L ( y ) = \ ― ? — o 

1 (2,500 + 2/)3， y>U ’ I 2,500 + 2 / ! y 

Figure 5.1 contains a plot of the densities. The modified densities are 
created as follows. For the excess loss variable, take the portion of the original 
density from 500 and above. Then shift it to start at zero and multiply it by 
a constant so that the area under it is still 1. The left censored and shifted 
variable also takes the original density function above 500 and shifts it to the 
u The remaining probability is concentrated at 

□ 


ndefined, y = 0, 
,500 + y 1 y> °~ 




presented because it is not defined at zero, making it 
；excess loss variable the hazard rate function is simply 








FREQUENCY AND SEVERITY WITH COVERAGE MODIFICATIONS 

0.0014 - 

0.0012 - 

V 0.488 probability at zero for the - 

left censored/shifted distribution 



VX / 

- ~ : - 

U1 





- Left censored/shifted 


Because this modification is unique to insurance applications, we will use 
per-payment and per-loss terminology. The per-loss variable is 


0, X<d, 
X : X>d, 


while the per-payment variable is 


p 一 J undefined, X <d, 
y = \ X, X>d. 

Note that, as usual, the per-payment variable is a conditional random variable. 
The quantities derived above for the ordinary deductible axe now 


iMd), y = 0, o l(v ) ^ J 

fx(y), y>d, 5yi(y) -\ 

Fx{d), 0<y<d, 

Fx(yl y>d, Y ⑻ 


Sx(d), 0<y<d, 
Sx(y), y>d, 

= f 0. 0<y<d, 

~ 1 h x {y), y>d, 


for the per loss variable and 
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hyp{y )== 


-jjr, y>d, S Y p(y) = < S x {y) 

K{d) l s^dV 

0, o < y < d, 

F x (y)-F x (d) _ , 

硕厂 ’ y >d ，. 

0, 0<y<d, 

hx(y), y>d 


for the per-payment variable. 

Example 5.4 Repeat Example 5.2 for a franchise deductible. 


Using the above formulas for the per-payment variable, for y > 500, 


3(2,000) 3 (2,000 + y)- 
(2,000) 3 (2 } 000 + 500)- 
/ 2,500 \ 3 

U000 + yJ ， 


3(2,500) 3 
(2,000 + yf 


Fyp(v)— 


For the per-loss variable, 


0.488, 
3(2,000) 3 
(2,000+ y) 41 
0.512 ， 

(2,000) 3 
(2,000+ y ) a， 
0.488, 

,( 2 , 000) 3 


(2,000+ j/) 3 ’ 


0 <y < 500, 
y> 500. 
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Theorem 5.5 For an ordinary deductible，the expected cost per loss is 
E(XAd) and the expected cost per payment is [E(X)-E(XAd)]/[l-F(d)]. For 
a franchise deductible the expected cost per loss is E(X) - E(X A d)+d[l - F(d)\ 
and the expected cost per payment is [E(X)—E(X A d)]/[l — F{d)] + d. 


Proof: For the per-loss expectation with an ordinary deductible, we have, 
from (3.7) and (3.10) that the expectation is E(X)-E(X A d). From (5.1) 
and (5.2) we see that, to change to a per-payment basis, division by 1 ~ F(d) 








Expectations could be derived directly from the density functions obtained 
in Examples 5.2 and 5.4. Using Theorem 5.5 and recognizing that we have a 
Pareto distribution, we can also look up the required values (the formulas are 
in Appendix A). That is, 



With E(X) = 1,000 we have, for the ordinary deductible, the expected cost per 
l oss i s 1,000 - 360 = 640 while the expected cost per payment is 640/0.512 = 
1,250. For the francliise deductible the expectations axe 640+500(1 — 0.488)= 
896 and 1,250 + 500 = 1,750. D 


5.2.1 Exercises 

5.1 Perform the calculations in Example 5.2 for the following distribution 
(which is Model 4 on page 15) using an ordinary deductible of 5,000. 



I- 0.3e— 0 . 00001 ' 


x < 0, 
x > 0. 


5.2 Repeat Exercise 5.1 for a franchise deductible. 

5.3 Repeat Example 5.6 for the model in Exercise 5.1 and a 5,000 deductible. 


Table 5.1 Data for Exercise 5.5 


X 

F(a:) 

E(XAx) 

10,000 

0.60 

6,000 

15,000 

0.70 

7,700 

22,500 

0.80 

9,500 

32,500 

0.90 

11,000 

oo 

1.00 

20,000 


the ratio of the expected cost per loss for risk 2 to the expected cost per loss 
for risk 1 . 


, 000 . 


in the expected cost per payment when the deductible is raised. 


5.3 THE LOSS ELIMINATION RATIO AND THE EFFECT OF 
INFLATION FOR ORDINARY DEDUCTIBLES 


A ratio that can be meaningful in evaluating the impact of a deductible is the 
loss elimination ratio. 

Definition 5.7 The loss elimination ratio is the ratio of the decrease in 
the expected payment with an ordinary deductible to the expected payment 
without the deductible. 

While many types of coverage modifications can decrease the expected 
payment, the term loss elimination ratio is reserved for the effect of changing 
the deductible. Without the deductible, the expected payment is E(X). With 
the deductible, the expected payment (from Theorem 5.5) is E(X)-E(X Ad). 
Therefore the loss elimination ratio is 

E(X) — [E(X) - E{XAd)] E{XAd) 

E(X) E(X) 

provided E(X) exists. 

Example 5.8 Determine the loss elimination ratio for the Pareto distribution 
with a = 3 and 6 = 2,000 with an ordinary deductible of 500. 



Lation ratio of 360/1,000 = 0.36. 
broducine: an ordinary deductible 
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Inflation increases costs, but it turns out that when there is a deductible 
the effect of inflation is magnified. First, some events that formerly produced 
losses below the deductible will now lead to payments. Second, the relative 
effect of inflation is magnified because the deductible is subtracted after infla¬ 
tion. For example, suppose an event formerly produced a loss of 600. With a 
500 deductible the payment is 100. Inflation at 10% wUl increase the loss to 
660 and the payment to 160, a 60% increase in the cost to the insurer. 

Theorem 5.9 For an ordinary deductible of d after uniform inflation ofl+r, 
the expected cost per loss is 

(1 + r){E(X)-E[XAd/(l + r)]}. 

If F[d/(l+r)] < 1, then the expected cost per payment is obtained by dividing 
byl-F[d/{l + r)\. 

Proof: After inflation, losses axe given by the random variable Y = ( 1 + r )-^- 
Prom Theorem 4.19, fy{y) = fx[y/0-+r)]/0-+r) and F Y {y) = F x [y/{l+r)}. 
Using (3.8), 

E(y Ad) .= J yfY(y)dy + d[l - F Y {d)] 

= 击)] 

广 d/(l+r) 

=/ ( 1 + r)xfx(^)dx^d 

Jo 

={l + r)lj xf x {x)dx + j^l - F x 

« (i + j-)e (x a 

where the third line follows from the substitution x = y/(l + r). Noting 
that E(Y) = (1 + r)E(X) completes the first statement of the theorem and 
the per-payment result foUows from the relationship between the distribution 
functions of Y and X. 

Example 5.10 Determine the effect of inflation at 10% on an ordinary de¬ 
ductible of 500 applied to a Pareto distribution with a = S and 6^ 2,000. 
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The expected cost per loss after inflation is 1.1(1,000 — 336.08) = 730.32, an 
increase of 14.11%. On a per-payment basis we need 

F r (500) = Fx(454.55) 

= ^ooo,_y 

V2,000 +454.55 ； 

= 0.459. 

The expected cost per payment is 730.32/(1 — 0.459) = 1,350, an increase of 

8 %. □ 

5.3.1 Exercises 

5.6 Determine the loss elimination ratio for the distribution given below with 
an ordinary deductible of 5,000. This is the same model used in Exercise 5.1. 

Fi{x) = j 0 3 e -o.ooooia ：) ^>0 ； 

5.7 Determine the effect of inflation at 10% on an ordinary deductible of 
5,000 applied to the distribution in Exercise 5.6. 

5.8 (*) Losses have a lognormal distribution with fi = 7 and a = 2. There 
is a deductible of 2,000 and 10 losses are expected each year. Determine the 
loss elimmation ratio. If there is uniform inflation of 20% but the deductible 
remains at 2,000, how many payments will be expected? 

5.9 (*) Losses have a Pareto distribution with ,a = 2 and 6 = k. There is 
an ordinary deductible of 2k. Determine the loss elimination ratio before and 
after 100% inflation. 

5.10 (*) Losses have an exponential distribution with a mean of 1,000. There 
is a deductible of 500. Determine the amount by which the deductible would 
have to be raised to double the loss elimination ratio. 

5.11(*) The values in Table 5.2 axe available for a random variable X. There 
is a deductible of 15,000 per loss and no policy limit. Determine the expected 

cost per payment using X and then assuming 50% inflation (with the de¬ 

ductible remaining at 15,000). 

5.12 (*) Losses have a lognormal distribution with = 6.9078 and cr = 
1.5174. Determine the ratio of the loss elimination ratio at 10,000 to the loss 
elimination ratio at 1,000. Then determine the percentage increase in the 
number of losses that exceed 1,000 if all losses are increased by 10%. 

5.13 (*) Losses have a mean of 2,000. With a deductible of 1,000 the loss 
elimination ratio is 0.3. The probability of a loss being greater than 1,000 is 
0.4. Determine the average size of a loss that is less than or equal to 1,000. 
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Table 5.2 Data for Exercise 5.11 


X 

Hx) 

E(XAi) 

10,000 

0.60 

6,000 

15,000 

0.70 

7,700 

22,500 

0.80 

9,500 

00 

1.00 

20,000 


5.4 POLICY LIMITS 

The opposite of a deductible is a policy limit. The typical policy limit arises 
in a contract where for losses below u the insurance pays the fall loss but for 
losses above the insurance pays only u. The effect of the limit is to produce 
at right censored random variable. It will have a mixed distribution with 
Histribution and density function given by (where Y is the random variable 



theorem. 


All losses 
payments 


payment 


eral inflation rate. The effect is the opposite of the deductible—inflation is 
tempered, not exacerbated. 

Figure 5.2 shows the density function for the right censored random vari¬ 
able. Erom 0 to 3,000 it matches the original Pareto distribution. The proba- 
bilityof exceeding 3,000, Pr(X > 3,000) = (2,000/5,000) 3 = 0.064 is concen¬ 
trated at 3,000. ^ 


A policy limit and an ordinary deductible go together in the sense that, 



of 500, the cost per loss to the policyholder is a random \ 
censored at 500. When the policy has a limit of 3,000 
payments axe a variable that is left truncated and shifte( 
deductible). The opposite of the franchise deductible is a 
truncates any losses (see Exercise 3.12). This coverage is 
(Would vou buv a policy that pays you nothing if your 1( 
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_ _ ation on a policy limit of 150,000 on the 

fonow^ig'disteibutioxi. TMs is the same distribution used in Exercises 5.1 and 
5.6. 

, / O , ^<0 

F ^ x ) = | l-0.3e_ o . ooool：E ，x > 0. 


5.15 (*) Let X has a Pareto distribution with a = 2 and 9 = 100. Determine 
the range of the mean residual life function e ⑷ as d ranges over all positive 
lumbers. Then let Y = 1.1X. Determine the range of the ratio ⑷ /ex ⑷ 
as d, ranges over all positive numbers. Finally, let Z he X right censored at 
500 (that is, a limit of 500 is applied to X). Determine the range of e z ⑷ as 
d ranges over the interval 0 to 500. 


5.5 COINSURANCE, DEDUCTIBLES, AND LIMITS 


The final common coverage modification is coinsurance. In this case the in- 



ular, the coinsurance is applied last. For the contract illustrated above, th 
policy limit is a{u - d), the maximum amount payable. In this definition, ■ 
is the loss above wliicli no additional benefits axe paid and wUl be called th 
mgy-imiim covered loss. For the per-payment variable, Y p is undefined fc 
X<d/{l + r). 

Previous results can be combined to produce the following theorem, given 
without proof. 


Theorem 5.13 For the perAoss variable, 


= 0:(1 + r) [E (X A 士 ) - E (X A 為 
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The expected value of the per-payment variable is obtained as 
E(0 = __^). 

.Higher moments are more difficult. The next theorem gives the formula for 
the second moment. The variance can then be obtained by subtracting the 
square of the mean. 

Theorem 5.14 For the per-loss variable 

E[(Y l ) 2 ] = o?(l + r) 2 {E[(X A u*f] - E[(X A d*) 2 } 

-2d*E(XAu*) + 2d*E{X A d ”}， 

where u* = u/{l + r) and d* = d/(l -f r). For the second moment of the 
per-payment variable, divide this expression by 1 — Fx{d*). 

Proof: From the definition of Y L ) 


E [{Y L ) 2 ] = / a 2 [(l + r)x - d] 2 f(x)dx - 

Jd x 


a 2 (u — d) 2 f(x)dx 


E[(y L ) 2 ] 

[a(l + r)] 2 


+d 2 [F(u*) - F(d*)} + (u-d) 2 [l - F(u*)] 

=(1 + r) 2 {E[(X A u*f] - u* 2 [l - F(«*| 
-E[(XAd*) 2 ]+d* 2 [l-F(d*)]} 

-2(1 + r)V {E(X 八 f) -u*[l- F{u*)] 

-E{XAd*) + d*[l-F(d*)]} 

+(l + r) 2 d* 2 [F(u*)-F(d*)} 

+(l + r) 2 {u*-d*) 2 [l-F(u*)} 

=E[(XA u*f] - E[(XA d.*) 2 } - 2d*[E(XAu*)-E{XAd*)]- 


Example 5.15 Determine the mean and standard deviation per loss for a 
Pareto distribution with a-3 and 6 - 2,000 with a deductible of 500 and a 
policy limit of 2,500. Note that the maximum covered, loss is u = 3,000. 
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rvom earlier examples, E{X A 500) = 360 and E(X A 3,000) = 840. The 
second limited moment is 


E[(XAu) 2 ] 


- … 2 ㈣ 、 
= 3(2,000 ) 3 乂二 { y - 2,000) V 、 + u 2 ( u +’ 2^000, 

= 3 ( 2 ， _ ) 3 ( 2 -2T 1 +2 ， _ y - 2 -^V 3 | 二) 


r i 2,000_2,000- 1 

3(2,000) + 2,000 + {u + 2,000) 2 3(u + 2,000 ) 3 」 


,000 2 , 000 2 


+3(2,000f - 2 000 2 + 3(2,000)3 




Then, E[(XA500) 2 ]= 160,000 and E[(X A 3 : 000) 2 ] = 1,440,000 and 


E(Y) = 840 — 360 = 480 

E(Y 2 ) = 1,440,000 - 160,000 - 2(500)(840) + 2(500)(360) = 800,000 


for a variance of 800,000 - 480 2 « 569,600 and a standard deviation rf 
754.72. 


5.5.1 Exercises 


O.O.l CAtJli-iaca 


5 17 (*) The loss ratio (i?) is defined as total losses (L) ^ded by earned 
iZiilsiP). AnageitwiU receive a bonus (B) if ^ 

business is less than 0.7. The bonus xs ^ p( 0 J 』 ) 么 3 二 a 

quantity is positive, otherwise it is zeio Lpt - ^ ^ Determine the 

Paxeto distribution with parameters a = 3 and 0 - 600,000. 賊 rmrn 
expected value of tlie bonus. 


higher by 10%. An insurance policy reimburses 100% of losses subject 
deductible of 11 up to a maximum reimbursement of 11. Determine the : 
of next year’s reimbursements to this year’s reimbursements. 


5.19 (*) Losses have an exponential distribution with a mean of 1,000. An 
insurance company will pay the amount of each claim in excess of a deductible 
of .100. Determine the variance of the amount paid by the insurance company 
for one claim, including the possibility that the amount paid is zero. 


5.20 (*) Total claims for a health plan have a Pareto distribution with a = 2 
and 0 = 500. The health plan implements an incentive to physicians that 
will pay a bonus of 50% of the amount by which total claims are less than 
500; otherwise no bonus is paid. It is anticipated that with the incentive plan 
the claim distribution will change to become Pareto with o ： = 2 and 0 = If. 
With the new distribution it turns out that expected claims plus the expected 
bonus is equal to expected claims prior to the bonus system. Determine the 
value of K. 


5.21 (*) In year a, total expected losses are 10,000,000. Individual losses in 
year a have a Paxeto distribution with a = 2 and 0 = 2,000. A reinsurer 
pays the excess of each individual loss over 3,000. For this, the reinsurer is 
paid a premium equal to 110% of expected covered losses. In year b，losses 
will experience 5% inflation over year a, but the frequency of losses will not 
change. Determine the ratio of the premium in yeax b to the premium in year 


5.22 (*) Losses have a uniform distribution from 0 to 50,000. There is a 
per 4oss deductible of 5,000 and a policy limit of 20:000 (meamng that the 
maximiim covered loss is 25,000). Determine the expected payment given 
that a payment has been made. 


5.23 (*) Losses have a lognormal distribution with /j, = 10 and a = 1. For 
losses below 50,000, no payment is made. For losses between 50,000 and 
100,000, the full amount of the loss is paid. For losses in excess of 100,000 the 
limit of 100,000 is paid. Determine the expected cost per loss. 


•6 THE IMPACT OF DEDUCTIBLES ON CLAIM FREQUENCY 




An important component in analyzing the effect of policy modifications per¬ 
tains to the change in the frequency distribution of payments when the de¬ 
ductible (ordinary or franchise) is imposed or changed. When a deductible 
is imposed or increased, there will be fewer payments per period, while if a 
deductible is lowered, there will be more payments. 

—' — 一 quantify this process, providing it can be assumed that the im- 
coverage modifications does not affect the process that produces 
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losses or the type of individual who will purchase insurance. For example, 
those who buy a 250 deductible on an automobile property damage coverage 
may (correctly) view themselves as less likely to be involved in an accident 
than those who buy full coverage. Similarly，an employer may find that the 
rate of permanent disability declines when reduced benefits axe provided to 
employees in the first few years of employment. 

To begin, suppose Xj, the severity, represents the ground-i 
such, loss and there axe no coverage modifications. Let N L de 
of losses. Now consider a coverage modification such that v \ 


Bernoulli' . . 

Then N p = Ii + . •. + ijvr 1 * represents the number of payments. If ii ， I 2 , • • • 
are mutually independent and are also independent of N L 、 then N p has a 
compound distribution with N L as the primary distribution and a Bernoulli 
secondary distribution. Thus 

P N p{z) = P N L[P Tj {z)] = P n l[1^v{z-1)]. 

In the important special case in which the distribution of N L depends on 
a parameter 6 such that 

P n l(z) = P n l{z;0) = B[0(z — 1)], 

where B(z) is functionally independent of 0 (as in Theorem 4.54)，then 

P N p(z) = B[0(1 -v-^vz-1)] 

=B[v6(z — 1)] 

=PpjL {z\v6). 

This implies that N L and N p axe both from the same parametric family and 
only the parameter 6 need be changed. 

Example 5.16 Demonstrate that the above result applies to the negative bir：: 
nomial distribution and illustrate the effect when a deductible of 250 is vm- f ： 
posed on a negative binomial distribution with r = 2 and P = 3. Assume that 
losses have a Pareto distribution with a = 3 and 6 == 1,000. 

The negative binomial pgf is P^l (z) = [1—^( 2 —l)]*" r . Here 冷 takes on the ； 
role of 6 in the result and B{z) = (1 一 名 ) 一 ' Then N p must have a negative 
binomial distribution with r* = r and (3* = For the particular situation ； 
described, 

, = 1-F(250)=( i；5 ^) =0.512 
and so r* = 2 and /3* = 3(0.512) = 1.536. 口 


This result may be generalized for zero-modified and zero-truncated distri¬ 
butions. Suppose N l depends on parameters 6 and a such that 


P^l(z) = Pj^l (z; 0, a) = a + (1 — a) 


B[6{z - 1)] — B{-6) 


Note that a = Pjvl(O) = Pi(N L = 0) and so is the modified probability at 
zero. It is also the case that，if B[0(z — 1)] is itself a pgf, then the pgf given in 
(5.3) is that for the corresponding zero-modified distribution. However, it is 
not necessary for B[0(z — 1)] to be a pgf in order for P n l (z) as given in (5.3) 
to be a pgf. In particular, B(z) = 1 + ln(l — z) yields the zero-modified (ZM) 




P N P (z) = P N L ( 2 ; v0, a*), 


where a* = Pr(N p = 0) = P^p(Q) = P^l(1 — v\Q^a). It is expected that 
imposing a deductible will increase the value of a because periods with no 
payments will become more likely. In particular, if N L is zero truncated, N p 
will be zero modified. 


Example 5.17 Repeat the previous example, only now let the frequency dis¬ 
tribution be zero-modified negative binomial with r = 2, /3 = 3, and p^ 1 = 0.4. 


巧奸⑷ =<+( i - p 0 m ) 


[1-外一1)广一(1+奶- 


Then a = Pq 1 and B(z) = (1 一 2 ) 一 r . We then have 


r, P* = v/3, and 


p ^= p ^+ + / r 


P^ 1 — (1 + Pr r + (1 + v(3)- r - + v/3)- r 

1-(1 • 


For the particular distribution given, the new parameters are r* = 2, /3* 
3(0.512) = 1.536, and 


0.4-4~ 2 4-2.536- 


In applications, it may be the case that we want to determine the distrib- 
ution of N L from that of N p • For example, data may have been collected on 
the number of payments in the presence of a deductible and from that data 
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the parameters of N p can be estimated. We may then want to know the 
distribution of payments if the deductible is removed. Arguing as before, 

P n l(z) = + 抑 -1 ). 

This implies that the formulas derived previously hold with 幻 replaced by 1/v. 
However, it is possible that the resulting pgf for N L is not valid. In this case 
one of the modeling assumptions is invalid (for example, the assumption that ： 
changing the deductible does not change claim-related behavior). 

Example 5.18 Suppose payments on a policy with a deductible of 250 have 
the zero-modified negative binomial distribution with r* = 2 7 P* = 1.536, and 
p^ 1 * = 0.4595. Losses have the Pareto distribution with a = 3 and 0 = 1,000. 
Determine the distribution of the number of payments when the deductible is 
removed. Repeat this calculation assuming Pq 1 * = 0.002. 

In this case the formulas use v = 1/0.512 = 1.953125 and so r = 2 and 
(3 = 1.953125(1.536) = 3. Also, 

0.4595 - 2.536- 2 + 4~ 2 - 0.4595 (4 广 2 — n < 

Po = 1-2.536- 2 = 

as expected. For the second case, 

= ，: 2 + 以^ = _o. 1079 , 

which is not a legitimate probability. □ 


All members of the (a, b, 0) and (a, & ， 1) classes meet the conditions of this 
section. Table 5.3 indicates how the parameters change when moving from 
N l to N p . If N L has a compound distribution, then we can write Pn l { z ) =; 
Pi[P 2 (z)] and therefore 

P N p(z) = P n l[ 1 -i- v(z - 1)] = Pi{P 2 [l + v(z - 1)]}. 

Thus N p will also have a compound distribution with the secondary distri-: 
bution modified as indicated. If the secondary distribution has an (a, 6,0) 
distribution, then it can modified as in Table 5.3. The following example in¬ 
dicates the adjustment to be made if the secondary distribution has an (a, 6,1) 
distribution. 
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Table 5.3 Frequency adjustments 


N L 

Parameters for N p 




Poisson 

A* = 

vX 




ZM Poisson 

Po M * 

P 合 1 - e- A + e 

—v\ 一 

P^e-^ 

A* = vX 

Binomial 

q* = 

vq 





wV/* 

p^-(l-q) m + (l 

.— vq) m — 

PQ f (l - vq) m 




1- 

(i - q) m 


Negative binomial 

p* = 

: vP, r* =r 






-{l + P)-r + {l +vf3 )-r 

-p^(l + vP)~ r 




1- 

-(l + /3)-7 


binomial 

r- 

: vP, r* = r 




ZM logarithmic 


: = 1 -( 1 -0(1 + 

u/3)/ln(l + /3) 


/3* = 

v/3 





and r* = 4. This would be sufficient, except we have acquired the habit of us¬ 
ing the ETNB as the secondary distribution. From Theorem 4.54 a compound 
Poisson distribution with a zero-modified secondary distribution is equivalent 



isurance policy has an accidental death rider. For ordinary 
is 10,000; however, for accidental deaths, the benefit is 
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20,000. The insureds are approximately the same age. so it is reasonable 
to'assume they all have the same claim probabilities. Let them be 0.97 for 
no claim, 0.01 for an ordinary death claim, and 0.02 for an accidental death 
claim. A reinsurer has been asked to bid on providing an excess reinsurance 
that will pay 10,000 for each accidental death. 

(a) The claim process can be modeled with a frequency component that 
has the Bernoulli distribution (the event is claim/no claim) and 
a two-point severity component (the probabilities axe associated 
with the two claim levels, given that a claim occurred). Specify 
the probability distributions for the frequency and severity random 
variables. 

(b) Suppose the reinsurer wants to retain the same frequency distribu¬ 
tion. Determine the modified severity distribution that will reflect 
the reinsurer’s payments. 

(c) Determine the reinsurer’s frequency and severity distributions when 
the severity distribution is to be conditional on a reinsurance pay¬ 
ment being made. 

5.25 Individual losses have a Pareto distribution with a = 2 and 0 = 1,000. 
With a deductible of 500 the frequency distribution for the number of pay¬ 
ments is Poisson-inverse Gaussian with A = 3 and /3 = 2. If the deductible is 
raised to 1,000, determine the distribution for the number of payments. Also, 
determine the pdf of the severity distribution (per payment) when the new 
deductible is in place. 

5.26 Losses have a Pareto distribution with a = 2 and 0 = 1,000. The fre¬ 
quency distribution for a deductible of 500 is zero-truncated logarithmic with 
/? = 4. Determine a model for the number of payments when the deductible 
is reduced to 0. 

5.27 Suppose that the number of losses N L has the Sibuya distribution (see 
Exercises 4.47 and 4.59) with pgf P n l(z) = 1 一 （1 一 z) 一 ' where <J < 
0. Demonstrate that the number of payments has a zero-modified Sibuya 
distribution. 


6 


Aggregate loss models 


6.1 INTRODUCTION 

The purpose of this chapter is to develop models of aggregate losses, the total 
amount paid on all claims occurring in a fixed time period on a defined set 
of insurance contracts. There are two ways to go about adding the claims in 
order to obtain the total for the period. 

One method is to record the payments as they are made and then add 

them up. In that case we can represent the aggregate losses as a sum, 5, 

of a random number, N, of individual payment amounts (X l5 X 2 ,...,Xjv). 
Hence, * 

S = X 1 + X 2 + -^X Ni iV = 0 ， l ， 2 , …， (6.1) 

where 5 = 0 when N = 0. 

Definition 6.1 The collective risk model has the representation in (6.1) 
with the XjS being independent and identically distributed (i.i.d.) random 
variables, unless otherwise specified. More formally, the independence as¬ 
sumptions are: 

1. Conditional on N = n, the random variables ... ,X n are i.i.d. 

random variables. 


Loss Models: From Data to Decisions. Second Edition. 

By Stuart A. Klugman, Harry H. Panjer. and Gordon E. Willmot 
ISBN 0-471-21577-5 Copyright © 2004 John Wiley Sz Sons, Inc. 
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f = n，the couiTnon distribution of the randoTri variables 
loes not depend on n. 




4. The impact on claims frequencies of changing deductibles is better u 
derstood. 

5. Data that are heterogeneous in terms of deductibles and limits can 1 
combined to obtain the hypothetical loss size distribution. This is usef 
when data from several years in which policy provisions were changii 
axe combined. 


noncovered losses 
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of any payments in excess of 50,000. 一一 — r - 厂 - - - 

(a) the total loss preinsurance to the policyholder, (b) the aggregate loss to 
the insurer prior to the reinsurance payment, (c) the aggregate loss to the 
reinsurer ， {d) the aggregate loss to the insurer，after the reinsurance •payment, 
and (e) the aggregate loss to the insured. 

(a) The aggregate losses with no insurance are S = - 

where the XjS have the distribution with pdf /x(^)- 

(b) The aggregate payments by the insurer (before recovery of reinsur¬ 
ance) are 5 = Yi + K + … + Yni where 

1 0 ， Xj< 1,000 ， 

0.80(4 — 1,000), 1,000 < Xj < 126,000 ， 

100,000 ， Xj> 126,000. 

(c) The aggregate payments by the reinsurer are 5 = I 1 +I 2 +** •+&， 
where 


Yj = l 0.80(Xj - 63,500), 63,500 < Xj < 126,000, 

[ 50,000, Xj > 126,000. 

(d) The aggregate costs to the insurer after recovery of reinsurance::: 
payments are 5 = Yi + ^2 + • • • + Yn, where 


Yj = ^ 0.80(^- - 1,000), 1,000 < Xj < 63,500, 

( 50,000, Xj > 63,500. 

(e) The aggregate costs to the insured, that is, the uninsured costs ,： 
axe S = Yi + Y 2 + ■■■ + Yn, where 


800 + 0.20Xj, 1,000 < Xj < 126,000, 
Xj - 100,000, Xj > 126,000. 


6.1.1 Exercise 

6.1 Show that the costs to the insured, the insurer, and the reinsurer in 
Example 6.3 sum to the aggregate loss. 


6.2 MODEL CHOICES 
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In many cases of fitting frequency or severity distributions to data, several 
distributions maybe good candidates for models. However, some distributions 
may be preferable for a variety of practical reasons. 

In general, it is useful for the severity distribution to be a scale distribution 
(see Definition 4.2) because the choice of currency (e.g., U.S. dollars or British 
pounds) should not affect the result. Also, scale families are easy to adjust 
for inflationary effects over time (this is, in effect ，汪 change in currency; e.g .， 
1994 U.S. dollars to 1995 U.S. dollars). When forecasting the costs for a future 
year, the anticipated rate of inflation can be factored in easily by adjusting 
the parameters. 

A similar consideration applies to frequency distributions. As a block of an 
insurance company’s business grows, the number of claims can be expected to 
grow, all other things being equal. If one chooses models that have probability 
generating functions of the form 

P N (z ： a) = Q(z) a (6.2) 


for some parameter a, then the expected number of claims is proportional 
to a. Increasing the volume of business by 100r% results in expected claims 
being proportional to a* = (1 + r)a. This was discussed in Section 4.6.11. 
Because r is any value satisfying r > 一 1， the distributions satisfying (6.2) 
should allow a to take on any positive values. Such distributions can be 
shown to be infinitely divisible (see Definition 4.65). 

A related consideration also suggests frequency distributions that are infi¬ 
nitely divisible. This relates to the concept of invariance over the time period 
of study. Ideally the model selected should not depend on the length of the 
time period used in the study of claims frequency. The expected frequency 
should be proportional to the length of the time period after any adjustment 
for growth in business. This means that a study conducted over a period of 
10 years can be used to develop claims frequency distributions for periods of a 
month, a year, or any other period. Furthermore, the form of the distribution 
for a one-year period is the same as for a one-month period with a change of 
parameter. The parameter a corresponds to the length of a time period. For 
example, if a = 1.7 in (6.2) for a one-month period, then the identical model 
with a = 20.4 is an appropriate model for a one-year period. 

Distributions that have a modification at zero are not of the form (6.2). 
However, it may still be desirable to use a zero-modified distribution if the 
physical situation suggests it. For example, if a certain proportion of policies 
never make a claim，due to duplication of coverage or other reason, it may be 
appropriate to use this same proportion in future periods for a policy selected 
at random. 
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For pgfs satisfying (6.2), show that the mean is proportional to a .. 

6.3 Which of the distributions in Appendix B satisfy (6.2) for any positive 
' lie of a? 


THE COMPOUND MODEL FOR AGGREGATE CLAIMS 

- S denote aggregate losses associated witli a set of iV observed claims 
,X 2 , … ，叉 n satisfying the independence assumptions following (6.1). The 
proacli in this chapter is to: 

1. Develop a model for the distribution of N based on data. 

2. Develop a model for the common distribution of the XjS based on data. 

3 . Using these two models, carry out necessary calculations to obtain the 
distribution of 5. 

Completion of the first two steps follows the ideas developed elsewhere in 
is text. We now presume that these two models are developed and that 
i only need to carry out numerical work in obtaining solutions to problems 
sociated with the distribution of S. These might involve pricing a stop-loss 
reinsurance contract, and they require analyzing the impact of changes in 
deductibles, coinsurance levels, and maximum payments on individual losses. 
The random sum 

5 = Xi + X 2 + •••+X iV 

(where N has a counting distribution) has a distribution function 
F s {x) = Pr(5 < x) 


=^p n Pr(S<x|iV = n) 

n=0 

= 


w here F x {x) = Pr(X < x) is the common distribution function of the XjS 
and p n = Pr(iV = n). £i (6.3), F^ n {x) is the “n-fold convolution” of the cdf 
of X. It can be obtained as 




1 ， x>0, 


and 
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F 穿 (x)= f F^-^ix-y) dF x (y) for fc = 1 ， 2,… . （ 6.4) 

J — OO 

If X is a continuous random variable with probability zero on negative values, 
(6.4) reduces to 

Fx(^) = F^ix- y)fx(y)dy{ov fc = 2,3,…. 

For k = 1 this equation reduces to = Fx{x). By differentiating, the 

pdf is 

/i fe (x) = r y)fx{y)dyiov fc = 2,3, .... 

Jo 

In the case of discrete random variables with positive probabilities at 0,1 ， 2,. • •， 
Equation (6.4) reduces to 

Fx(x) = ^ F^ k ^\x - y)fx(y) for x = 0,1，…， fc = 2,3,… • 
y=o 

The corresponding pf is 

. fx( x ) = fx k ^ 1] ( x - y)fx{v) for a ； = 0,1， ...，/e = 2,3, .... 

y=Q 

The distribution (6.3) is called a compound distribution and the pf for 
the distribution of aggregate losses is 


/ s ⑷ = x>/r ⑷- 

n=0 


Arguing as in Section 4.6.7, the pgf of S is 


Ps(z) = E[z s ] 

OO 

= ^[z Xl+X2+ - +Xn \N = n] Pt(N = n) 

n=0 

= f[z Xi Pt(N = n) 

n=0 j=l 

=f2^(N = n)[P x (z)] n 

71=0 

= E[Px(z) N ] = Pn[Px(z)} 


due to the independence of X\^.... X n for fixed n. 


(6.5) 
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A similar relationship exists for the other generating functions. It is some¬ 
times more convenient to use the characteristic function 

^Ps( z ) = E(e izS ) = Pj\rbx( 2 )]> 

which always exists. Panjer and Willmot [106] use the Laplace transform 
L s (z)^E(e^ s )^P N [Lx(z)] 

which always exists for random variables defined on nonnegative values. With 
•regard to the moment generating function, we have 

Ms{z) = Pn[Mx{z)]- 

The pgf of compound distributions was discussed in Section 46.7 where the 



In the case where Pn{^) = Pi[P 2 (^)] — that is, N is itself a compound 
distribution—( 2 ) = Pi{P 2 [Px(z)]}, which in itself produces no additional| 
difficulties. 

From (6.5), the moments of S can be obtained in terms of the moments of | 
N and the Xjs. The first three moments axe 

E(5) = 

Var(5) = Hs 2 — MjviMx 2 + f^mWxi、 2 , (6.6) 

E{[S-E(S)] 3 } = /J-S3 = MatiMx 3 + 3/x W2 MSfiMxa + MiV3(Mxi) 3 - 

Here, the first subscript indicates the appropriate random variable, the second 
subscript indicates the order of the moment, and the superscript is a prime 
O for raw moments (moments about the origin) and is unprimed for central 
moments (moments about the mean). The moments can be used on their own 
to provide approximations for probabilities of aggregate claims by matching 
the first few model and sample moments. 

Example 6.4 The observed mean (and standard deviation) of the number of 
claims and the individual losses over the past 10 months are 6.7 (2.3) aads 
179,247 (52,141), respectively. Determine the mean and variance of aggregate ; 
claims per month. 

E{S) = 6.7(179,247) = 1,200,955, 

Vax(5) = 6.7(52,141) 2 4- (2.3) 2 (179,247) 2 
=1.88180 x 10 11 . 

Hence, the mean and standard deviation of aggregate claims are 1,200,955 
and 433,797, respectively. 口 


Example 6.5 (Example 6.4 continued) Using normal and lognormal distrib¬ 
utions as approximating distributions for aggregate claims, calculate the prob¬ 
ability that claims will exceed 140% of expected costs. That is ， 


Pr(5> 1.40 x 1,200,955) = Pr(5 > 1,681,337). 


.For the normal distribution 


Pr(5> 1,681,337) 


_ ( S - E(5) 1,681,337-1,200,955 

=r > 433,797 ‘ 

= Pi{Z > 1.107) = 1 - $(1.107) = 0.134. 


For the lognormal distribution, from Appendix A, the mean and second 
raw moment of the lognormal distribution are 


E(5) = exp(/x + |cr 2 ) and E(5 2 ) = exp(2/x + 2a 2 ). 


Equating these to 1.200955 x 10 6 and 1.88180 x 10 11 + (1.200955 x 10 6 ) 2 
1.63047 x 10 12 and taking logarithms results in the following two equations : 
two unknowns: 


/x+|a- 2 = 13.99863, 2/x + 2a 2 =: 

From this /x = 13.93731 and o 2 = 0.1226361. Then 


Pr(5> 1,681,337) 


'In 1,681,337-13.93731 1 
(0.1226361) 0 * 5 


$(1.135913) = 0.128. 


.The normal distribution provides a good approximation when E(iV) is large. 
In particular, if N has the Poisson, binomial, or negative binomial distribu¬ 
tion, a version of the central limit theorem indicates that, as A, m, or r, 
respectively, goes to infinity, the distribution of S becomes normal. In this 
example, E(iV) is small so the distribution of 5 is likely to be skewed. In this 
case the lognormal distribution may provide a good approximation, although 
: there is no theory to support this choice. □ 


Example 6.6 (Group dental insurance) Under a group dental insurance plan 
covering employees and their families，the premium for each married employee 
is the same regardless of the number of family members. The insurance com- 
pa/n/y has compiled statistics showing that the annual cost (adjusted to current 
dollars) of dental care per person for the benefits provided by the plan has the 
" — …… (given in units of 25 dollars). 







Hence the annual cost of the dental plan has mean 12.58 x 25 = 314.50 dollars 
and standard deviation 191.6155 dollars. (Why can*t the calculations be done 
from Table 6.3?) □ 

It is common for insurance to be offered in which a deductible is applied to 
the aggregate losses for the period. When the losses occur to a policyholder 
it is called insurance coverage and when the losses occur to an insurance 
company it is called reinsurance coverage. The latter version is a common 
method for an insurance company to protect itself against an adverse year (as 
opposed to protecting against a single, very large claim). More formally, we 
present the following definition. 

Definition 6.7 Insurance on the aggregate losses，subject to a deductible, is 
called stop-loss insurance. The expected cost of this insurance is called the 
net stop-loss premium and can be computed as E[(5 — d)+], where d is the 
deductible and the notation (•)+ means to use the value in parentheses if it is 
positive but to use zero otherwise. 

For any aggregate distribution, 

E[(S-d) + } = ^[l-Fs^dx. 


n 

Pn 

0 

0.05 

1 

0.10 

2 

0.15 

3 

0.20 

4 

0.25 

5 

0.15 

6 

0.06 

7 

0.03 

8 

0.01 


The insurer is now in a 'position to calculate the distribution of the cost per 
year per married employee in the group. The cost per married employee is 

fs{x) = 

n=0 

Determine the pf of S up to 525. Determine the mean and standard devi¬ 
ation of total payments per employee. 

The distribution up to amounts of 525 is given in Table 6.3. To obtain 
fs(x), each row of the matrix of convolutions of /x(x) is multiplied by the 
probabilities from the row below the table and the products are summed. 

The reader may wish to verify using (6.6) that the first two moments of 
the distribution fs(x) are 

E(5) = 12.58, Var(5) = 58.7464. 
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Table 6.1 Loss distribution for Example 6.6 


X 

fx(x) 

1 

0.150 

2 

0.200 

3 

0.250 

4 

0.125 

5 

0.075 

6 

0.050 

7 

0.050 

8 

0.050 

9 

0.025 

10 

0.025 


Table 6.2 Frequency distribution for Example ( 


» fx° fx 1 

fx 

fx 3 

/i 4 

fx 5 

fx 6 

fx T 

fx 

fs(x) 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

1 0 

0 .150 

0 .200 

0 .250 

0 .125 

0 .075 

0 .050 

0 .050 

0 .050 

0 .025 

0 .025 

0 0 

0 0 

0 0 

0 0 

0 0 

0 0 

0 0 

0 0 

0 0 

0 0 

0 0 

0 

0 

02250 

06000 

11500 

13750 

13500 

10750 

08813 

07875 

07063 

06250 

04500 

03125 

01750 

01125 

00750 

00500 

00313 

00125 

00063 

0 

ooooooooooooooooooo 

ot-*>-*ioco*.oio»-sic»oo«ocD«oooow>-*o 

00h-*0>«0OH-C0OOC000t*500-vJC«H**t.WC»3 

O OO -vl 05 Ol lO OO ->-1 05 tfc- o« OJ OO C>» CO 

OOCO'llOCn'IOCOOCOOOHOOit^OOOOOOOO 

0 

0 

0 

0 

00051 

00270 

00878 

01999 

03580 

05266 

06682 

07597 

08068 

08266 

08278 

08081 

07584 

06811 

05854 

04878 

03977 

03187 

0 

0 

0 

0 

0 

00008 

00051 

00198 

00549 

01194 

02138 

03282 

04450 

05486 

06314 

06934 

07361 

07578 

07552 

07263 

06747 

06079 

0 

0 

0 

0 

0 

0 

•00001 

•00009 

.00042 

■00136 

.00345 

.00726 

.0X305 

.02062 

.02930 

.03826 

.04677 

.05438 

.06080 

•06573 

.06882 

.06982 

0 

0 

0 

0 

0 

0 

0 

00000 

00002 

00008 

00031 

00091 

00218 

00448 

00808 

01304 

01919 

02616 

03352 

04083 

04775 

05389 

0 

0 

0 

0 

0 

0 

0 

0 

.00000 

.00000 

.00002 

.00007 

.00022 

.00060 

•00138 

.00279 

.00505 

.00829 

.01254 

.01768 

.02351 

.02977 

05000 

01500 

02338 

03468 

03258 

03579 

03981 

04356 

04752 

04903 

05190 

05138 

05119 

05030 

04818 

04576 

04281 

03938 

03575 

03197 

02832 

02479 

Pn •' 

5 .10 

•15 

.20 

•25 

.15 

.06 

•03 

-01 
















E[(5-d) + ] = ( 

Similarly, for discrete random variables, 


{x- d)fs(x) dx. 


E[(5 — d) + ] = E(x —d)/ s (a：). 


- Any time there is an interval with no aggregate probability, the following 
result may simplify calculations. 

Theorem 6.8 Suppose Pr(a <S <b) = 0. Then，for a<d<b } 

E[(S - d)+] = |^E[(5 - a)+] + - b)+\. 

That is, the net stop-loss premium can be calculated via linear interpolation. 
Proof: From the assumption, F s (x) = F s {a), a<x<b. Then, 


F s {x)]dx- / [l~F s {x)]dx 


E[(S - a)+] - / [l-F s {a)]dx 


E[(5 - a)+] 一 (d - a)[l - jPs ⑷】 • 


Then, by setting d = bia (6.7), 


E[(5 — 6)+1 = E[(5 — a) + ] - (6 - a)[1 - F s {a)} 


Substituting this in (6.7) produces the desired result. 」 

Further simplification is available in the discrete case, provided S places 
probability at equally spaced values. 

Theorem 6.9 Assume Pr(5 = kh) = f k > 0 /or some fixed h > 0 and 
fc = 0.1 ? ... and Pr(S = x) = 0 for all other x. Then，provided d = jh } wm ' 
j a non-negative integer 

E[(S - d)+] = ft. ^ {1 - F s [(m + j)h]}. 
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314.5 - 25(1 — 0.05) = 290.75, 

290.75 — 25(1 — 0.065) = 267.375, 
267.375 - 25(1- 0.08838) = 244.5845, 
244.5845 - 25(1- 0.12306) = 222.661. 


From Theorem 6.8, E[(5 — 30)+] = ||290.75 + 蠢 267.375 = 286.07. When 
compared to the original premium of 314.5, the reductions axe 23.75, 28.43, 


Prom Table 6.3, the cdf at 0, 25, 50, and 75 dollars has values 0.05, 0.065, 
0.08838, and 0.12306. With E(5) = 25(12.58) = 314.5 we have 


Example 6.11 (Example 6.6 continued) The insurer is examining the effect 
of imposing an aggregate deductible per employee. Determine the reduction in 
the net premium as a result of imposing deductibles of 25, 30, 50, and 100 
dollars. 


Proof: 

ms-d) + ] = ^(x-d)/s(x) 

x>d 

oo 

=jydi-jh)h 

k=j 

oo fc — j 一 1 

=£ fk 
k=j m=0 
oo oo 

= h Yl ^ 

m=0 fc=m+j’+l 
oo 

=hY^^-PsKm + ^h]}. 

m—0 □ 

In the discrete case with probability at equally spaced values, a simple 
recursion holds. 

Corollary 6.10 Under the conditions of Theorem 6.9, 

E{[5-(j + l)h] + } = E[(5 - jh) + ]-h[l~- F s (jh)\. 

This result is easy to use because, when d = 0, E[(5 — 0)+] = E(5)= 
E(N)E(X), which can be obtained directly from the, frequency and severity 
distributions. 
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6.3.1 Exercises 

6.4 Erom (6.5)，show that the relationships between the moments in (6.6) 
hold. 

6.5 When an individual is admitted to the hospital, the hospital charges have 

tlie following characteristics: __ 

1 Standard 

’ Charges Mean_deviation 

Room 1,000 500 

Other_500_300 

2. The covariance between an individual^ room charges and other charges 
is 100,000. 

An insurer issues a policy that reimburses 100% for room charges and ； 
80% for other charges. The number of hospital admissions has a Poisson ： 
distribution with parameter 4. Determine the mean and standard deviation s 
of the insurer’s payout for the policy. 

6.6 Aggregate claims have been modeled by a compound negative binomial ； 
distribution with parameters r* = 15 and 0 = 5. The claim amounts axe ： 
uniformly distributed on the interval (0,10). Using the normal approxima^j 
tion, determine the premium such that the probability that claims will exceed : 
premium is 0.05. 

6.7 Automobile drivers can be divided into three homogeneous classes. The 
number of claims for each driver follows a Poisson distribution with parameter 
入 . Determine the variance of the number of claims for a randomly selected 
driver, using the following data. 



Proportion 
of population 

入 


0.25 

5 


0.25 

3 

IHi 

0.50 

2 


6.8 Assume X u X 2 , and X 3 are mutually independent loss random variables ； 
with probability functions as given in Table 6.4. Determine the pf of S 
X\ + X 2 4 " -^ 3 * 

6.9 Assvime Xi, X 2 , and X 3 are mutually independent random variables:: 

with probability functions as given in Table 6.5. If 5 = + X 2 + X 3 and 

f s (5) = 0.06. determine p. 


Table 6.4 Distributions for Exercise 6.8 


X 

fl(x) 

f2(X) 

fs(x) 

0 

0.90 

0.50 

0.25 

1 

0.10 

0.30 

0.25 

2 

0.00 

0.20 

0.25 

3 

0.00 

0.00 

0.25 


Table 6.5 Distributions for Exercise 6.9 


X 

h{x) 

/2 ⑷ 

/3 ⑻ 

0 

1 

2 

3 

V 

1 一 p 

0 

0 

0.6 

0.2 

0.1 

0.1 

0.25 

0.25 

0.25 

0.25 


6.10 Consider the following information about AIDS patients. 

1. The conditional distribution of an individual’s medical care costs, given 
that the individual does not have AIDS, has mean 1,000 and variance 
250,000. 

2. The conditional distribution of an individual’s medical caxe costs, given 
that the individual does have AIDS, has mean 70,000 and variance 
1,600,000. 

3. The number of individuals with AIDS in a group of m randomly selected 
• adults has a binomial distribution with parameters m and q = 0.01. 

An insurance company determines premiums for a group as the mean 
pluslO% of the standard deviation of the group，s aggregate claims distrib¬ 
ution. The premium for a group of 10 independent lives for which all individ¬ 
uals have been proven not to have AIDS is P. The premium for a group of 
10 randomly selected adults is Q. Determine P/Q. 

6.11 You have been asked by a city planner to analyze o 伍 ce cigarette smoking 
patterns. The planner has provided the information in Table 6.6 about the 
distribution of the number of cigarettes smoked during a workday. 

The number of male employees in a randomly selected office of N employ¬ 
ees has a binomial distribution with parameters N and 0.4. Determine the 
mean plus the standard deviation of the number of cigarettes smoked during 
a workday in a randomly selected office of eight employees. 

6.12 For a certain group, aggregate claims are uniformly distributed over 
(0,10). Insurer A proposes stop-loss coverage with a deductible of 6 for a 
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6.15 A reinsurer pays aggregate claim amounts in excess of d，and in return 
it receives a stop-loss premium E[(5 —d)+]. You are given E[(5 - 100)+] = 15, 
E[(S —120)+] = 10, and the probability that the aggregate claim amounts are 
greater than 80 and less than or equal to 120 is 0. Determine the probability 
that the aggregate claim amounts are less than or equal to 80. 


Table 6.6 Data for Exercise 6.11 




Female 



3 



31 


premium equal to the expected stop-loss claims. Insurer B proposes group 
coverage with a premium of 7 and a dividend (a premium refund) equal to 
•the excess, if any, of 7k over claims. Calculate k such that the expected cost 
to the group is equal under both proposals. 

6.13 For a group health contract, aggregate claims are assumed to have an 
exponential distribution where the mean of the distribution is estimated by the ： 
group underwriter. Aggregate stop-loss insurance for total claims in excess of 
125% of the expected claims is provided by a gross premium that is twice the ：； 
expected stop-loss claims. You have discovered an error in the underwriter^ ; 
method of calculating expected claims. The underwriter’s estimate is 90% of 
the correct estimate. Determine the actual percentage loading in the premiuin. 

6.14 A random loss, X, has the following probability function. You axe given: 
that E(X) = 4 and E[(X - d) + ] = 2. Determine d. 




65205555 2 
-0 2 . 2 . 1 . 0 . 0 . 0 . 0 . 1 
l o.o.o.o.o.o.o.o.o 


6.18 An insurance portfolio produces N claims，where 


n Pt(N = n) 

0.5 
0.4 
0.1 


Individual claim amounts have the following distribution: 


x fx(x) 

0.9 

0.1 


Individual claim amounts and N are mutually independent. Calculate the 
probability that the ratio of aggregate claims to expected claims will exceed 
3.0. 


10 


0 

1 

3 















AGGREGATE LOSS MODELS 

2 2 g(l - g) = 2,500, where q is the probability of death by travel accident 
Dr an individual. 

certain group of 100 lives, the independence assumption fails because 
ipecific individuals always travel together. If one dies in an accident, 
… ’. 3 . Determine the difference between 




Probability of 
claim 


All claims are mutually independent. The insurance company’s retention 
limit is 2 units per life. Reinsurance is purchased for 0.03 per unit. The 
probability that the insurance company’s retained claims, 5, plus cost of rein¬ 
surance will exceed 1.000 is Pr [^ 7 === : > k \ . Determine K using a normal 


surance will exceed 1,000 is Pr ^==== > iiTJ . Determine K m 
approximation. 

6.21 The probability density function of individual losses Y is 


The amount paid, Z, is 80% of that portion of the loss that exceeds a de- 
ductible of 10. Determine E(Z). 

6.22 An individual loss distribution is normal with fi = 100 and a 2 = 9. The 
distribution for the number of claims, N, is given in Table 6.7. Determine the 
probability that aggregate claims exceed 100 . 


ribution for Exercise 6.22 


0 


Table 6.8 Distribution for Exercise 6.23 


n 

f{n) 

0 

1/16 

1 

1/4 

2 

3/8 

3 

4 

1/4 

1/16 


6.23 An employer self-insures a life insurance program with the following 
characteristics: 

1. Given that a claim has occurred, the claim amount is 2,000 with prob¬ 
ability 0.4 and 3,000 with probability 0.6, 

2 . The number of claims has the distribution given in Table 6 . 8 . 



6.4 ANALYTIC RESULTS 

For most choices of distributions of N and the XjS, the compound distribu¬ 
tional values can only be obtained numerically. Subsequent sections in this 
chapter are devoted to such numerical procedures. 

However, for certain combinations of choices, simple analytic results are 
available, thus reducing the computational problems considerably. 
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Example 6.12 (Compound geometric-exponential) Suppose … o/re 

i.i.d. with common exponential distribution with mean 0 and mgf Mx{z )= 
(1 一 0z) 一 1 . Suppose that N has a geometric distribution with parameter /3 and 
pgfP N (z) = [1 ~ /3(z - 1)] _1 (see Appendix B). Determine the distribution of 
S. 

The mgf of S is 

Ms{z) = Pn[M x (z)] 

= 

with a bit of algebra. 

This is a two-point mixture of a degenerate distribution with probability 1 
at zero and an exponential distribution with mean 0(1 + 0). Hence, Pr(5 = 
0) = (1 + /3)— 1 ，and for rn > 0, 5 has pdf 

/s(x) = 0(i+py ~ exp [-0(1+ 用] ' 

It has a point mass of (1 十冷广 1 at zero and an exponentially decaying density 
over the positive axis. Its cdf can be written as 

邮 )=1- 命邛 I’ x>0. 

It has a jump at zero and is continuous otherwise. This example will arise 
again in Chapter 8 in connection with ruin theory. 口 

Example 6.13 (Exponential severities) Determine the cdf of S for any com¬ 
pound distribution with exponential severities. 

The mgf of the sum of n independent exponential random variables each 
with mean 6 is 

Mx^Xz+^+Xn ( 2 )= ( 工 - 心 ) 一 n ， 

which is the mgf of the gamma distribution with cdf 

fT ⑷ = r(n;|) 

(see Appendix A). For integer values of a*, the values of r(a;x) can be calcu¬ 
lated exactly (see Appendix A for the derivation) as 

= n = l ， 2,3, — .. (6.8) 


From (6.3) 
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Fs{x) =P 0 + [p n r (n; 昏 ) 


Fs{x)=1 -f ：P Sf^f^, x > 


_ = f ： 
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which is of the form 

M s {z) = P* n [M* x {z)] 

where r 

the pgf of the binomial distribution with parameters r and 0/(1 + /?), and 
Mx(z) is the mgf of the exponential distribution with mean 0(1 + ^). 

This transformation reduces the computation of the distribution function 
to the finite sum of the form of (6.10), that is, 


DJ uiscrete aistn d uiiuxib aiou uc uocu. ---- 

WMch of the distributions in Appendix B axe closed under convo 
lution? How can this information be used in simplifying calculation 
of compound probabilities of the form (4.20)? 

compound negative binomial distribution has parameters /? = 1 ， 
and severity distribution {fx{x)\ x = 0,1,2,...}. How do the pa¬ 
rs of the distribution change if the severity distribution is {gx(x)= 
[1 一 / x ⑼]; x = 1,2, .. .} but the aggregate claims distribution remains 



!onsider the compound logarithmic distribution with exponential sever- 
ribution. 

[a) Show that the density of aggregate losses maybe expressed as 

fs(x) = [eoT0)} 产 le -’ 

(b) Reduce this to 

„ , , exp{—x/[0(l + P)]} - exp {—x/6) 


)The number of accidents incurred by an insured driver m a single ye! 
a Poisson distribution with parameter A = 2. If an accident occui 
probability that the damage amount exceeds the deductible is 0.25. T1 
iber of flaims and the damage amounts axe independent. What is tl 
jability that there will be no damages exceeding the deductible in a sing 


1 The .aggregate loss distribution is modeled by an insurance compai 
ig an exponential distribution. However, the mean is uncertain. TI 
ipany uses a uniform distribution (2,000,000, 4,000,000) to express i 
V nf what the mean should be. Determine the expected aggregate losses 











in days, T, has the following continuance (survival) function for 0 < t < 30: 


Pr(T >t) = l 0.95 - 0.035t, 10 < t < 20, 

[ 0.65 — 0.021 20 < t < 30. 

For a policy period, each member’s probability of a single hospital admission 
is 0.1 and of more than one admission is 0. Determine the pure premium per 
member, ignoring the time value of money. 


pound Poisson distributions as follows: 


Medical claims Uniform (0,1,000) 2 

Dental claims Uniform (0,200) 3 

Let X be the amount of a given claim under a policy which covers both 
medical and dental claims. Determine E[(X — 100)+]，the expected cost (in 
excess of 100) of any given claim. 


6.34 For a certain insured, the distribution of aggregate claims is binomial 
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(b) Show that the mgf of 5 is 


C(z) = Y / c k z k = P N {Q{z)}. 

k=Q 

(c) Describe how the distribution {c^; k = 0,1,2,...} may be calcu¬ 
lated recursively if the number of claims distribution is a member 
of the (a, 6,1) class (Section 4.6.6). 

(d) Show that the distribution function of 5 is given by 


F s(x) = 


^ (x/eye~ x / e 


where Cj = T,Z ， j+i °n- 


'i 6.5 COMPUTING THE AGGREGATE CLAIMS DISTRIBUTION 

The computation of the compound distribution function 


F s (x) = 5>« 巧⑷ 


or the corresponding probability (density) function is generally not an easy 
task, even in the simplest of cases. In this section we discuss a number of 

丄 ■. —— f th - r 


approximating 

approach was used in Example 6.5 where the 
ersof the approximating 
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the continuous type and there is a maximum possible claim (for example, 
when there is a policy limit), the severity distribution may have a point mass 
(“atom” or “spike”）at the maximum. The true aggregate claims distribution 
is of the mbced type with spikes at integral multiples of the maximum corre¬ 
sponding to 1 ， 2,3, . • • claims at tlie maximum. These spikes, if large, can have 
a significant effect on the probabilities near such multiples. These jumps in : 
the aggregate claims distribution function cannot be replicated by a smooth 
approximating distribution. 

The second method to evaluate (6.11) or the corresponding pdf is direct 
calculation. The most difficult (or computer intensive) part is the evalua¬ 
tion of the n-fold convolutions of the severity distribution for n = 2,3,4,.... : 
In some situations, there is an analytic form—for example, when the sever¬ 
ity distribution is closed under convolution, as defined in Example 6.15 and 
illustrated in Examples 6.12-6.14. Otherwise the convolutions need to be 
evaluated numerically using 

= 厂 F^ix-y) dF x {y). (6.12); 

When the losses are limited to nonnegative values (as is usually the case), the i 
range of integration becomes finite, reducing (6.12) to 

F ^( X ) = jT X F^ix-y) dF x (y). (6.13) 

These integrals axe written in Lebesgue-Stieltjes form because of possible 
jumps in tlie cdf F x (x) at zero and at other points. * 1 Numerical evaluation 
of (6.13) requires numerical integration methods. Because of the first term 
inside the integral, (6.13) needs to be evaluated for all possible values of x. 
This quickly becomes technically overpowering! 

A simple way to avoid these technical problems is to replace the severity 
distribution by a discrete distribution defined at multiples 0,1 ， 2 ... of some 
convenient monetary unit such as 1,000. This reduces (6.13) to (in terms of 
the new monetary unit) 

F^r k (x) = j2F* x {k - l \x- y)fx(y). 

y—0 

The corresponding pf is 

y=0 

1 Without going into the formal definition of the Lebesgue-Stieltjes integral, it suffices to 
interpret f g(y)dF x (y) as to be evaluated by integrating g{y)fx{y) over those y values for 
which X has a continuous distribution and then adding g{yi) Pt(X = yt) over those points 
where Pr(X = yi) >0. This allows for a single notation to be used for continuous, discrete, 
and mixed random variables. 


to the nearest multiple of the mon- 

r claims to the nearest 1,000. More 
Lter in this chapter, 
bion is defined on nonnegative integers 0 , 1 ， 2 ,…， 

I x requires x-\-l multiplications. Then carrying 

possible values of k and x up to n requires a 

at axe of order n 3 , written as 0 (n 3 * * * * ), to obtain 


tiie ciistriDUT/ion oi • 丄丄 』 iur x = u tu ^ 一 n. w 以⑽ " 认 一 , 

for which the aggregate claims distribution is calculated is large, the number 
of computations quickly becomes prohibitive, even for fast computers. For 
example, in real applications n can easily be as large as 1,000. This requires 
about 10 9 * * * * * multiplications. Farther, if Pr(X = 0) > 0, an infinite number 
of calculations axe required to obtain any single probability. This is because 
Fx l {x) > 0 for all n and all x and so the sum in ( 6 . 11 ) contains an infinite 
number of terms. When Pr(X = 0) = 0, we have F^ l (x) = 0 for n > x and 
so (6.11) will have no more than 工 + 1 positive terms. Table 6.3 provides an 
example of this latter case. 

Alternative methods to more quickly evaluate the aggregate claims distri¬ 
bution are discussed in the next two sections. The first such, method, the 
recursive method, reduces the number of computations discussed above 
to 0 (n 2 ). which is a considerable savings in computer time，a reduction of 
about 99:9% when n = 1.000 compared to direct calculation. However, the 
method is limited to certain frequency distributions. Fortunately, it includes 
all frequency distributions discussed in Section 4.6 and Appendix B. 

The second such method, the inversion method, numerically inverts a 

transform, such as the characteristic function，using a general or specialized 

inversion software package. Two versions of this method axe discussed in this 

chapter. 


X6.6 THE RECURSIVE METHOD 

Suppose that the severity distribution fx{^) IS defined on 0 , 1 , 2 ,... ,m repre¬ 

senting multiples of some convenient monetary unit. The number vn represents 

the largest possible payment and could be infinite. Further, suppose that the 

frequency distribution, p/；；, is a member of the (a, 6 , 1 ) class and therefore 
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Theorem 6.16 For the (a, 6,1) class, 

■c /„、 bi — (a + b)p 0 ]fx(x) + E=r( a + by/x)fx(y)fs(x - y). 
副 = T^af x (0) -， （ 6 . 14 ) 

noting that x Am is notation for min(rr ? m). 

Proof: This result is identical to Theorem 4.49 with appropriate substitution 
of notation and recognition that the argument of fx(x) cannot exceed m. □ 

Corollary 6.17 For the (a, b, 0) class，the result (6.14) reduces to 

脉 )= S (，—/:严 (…) ( , 15) 

Note that when the severity distribution has no probability at zero, the 
denominator of (6.14) and (6.15) equals 1. Further, in the case of the Poisson 
distribution, (6.15) reduces to 


fs(x) = ~Y1 yfx{y)fs(x-y), 


The starting value of the recursive schemes (6.14) and (6.15) is fs(0 )= 
■P/v[/x(0)] following Theorem 4.51 with an appropriate change of notation. 
In the case of the Poisson distribution we have 

Starting values for other frequency distributions are found in Appendix D. 


6.6.1 Applications to compound frequency models 

When tlie frequency distribution can be represented as a compound distribu- 
tion (e.g., Neyman Type A, Poisson—inverse Gaussian) involving only distri¬ 
butions from the (a, 6,0) or (a, 6,1) classes, the recursive formula (6.14) can 
be used two or more times to obtain the aggregate claims distribution. If the 
frequency distribution can be written as 

Pn(z)^P 1 [P 2 (z)], 

then the aggregate claims distribution has pgf 

p s(z) = Pn[Px(z)] 

= Pi{P2[Px(z)]h 

which can be rewritten as 


Ps{z) = Pi[P Sl {z)) 


(6.17) 
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where 

P Sl (z) = P 2 [Px(z)]. (6.18) 

Now (6.18) is the same form as an aggregate claims distribution. Thus, if 
P<i{z) is in the (a, 6,0) or (a, 6,1) class, the distribution of S\ can be calculated 
using (6.14). The resulting distribution is the “severity” distribution in (6.18). 
Thus, a second application of (6.14) to (6.17) results in the distribution of S. 
The following example illustrates the use of this algorithm. 


Example 6.18 The number of claims has a Poisson-ETNB distribution with 
Poisson parameter A = 2 and ETNB parameters /? = 3 and r = 0.2. The 
claim size distribution has probabilities 0.3, 0.5, and 0.2 at 0, 10, and 20, 
respectively. Determine the total claims distribution recursively. 


In the above terminology, N has pgf Pn{z) = P\ [Po(z)]^ where P\{z) and 
P 2 (z) are the Poisson and ETNB pgfs, respectively. Then the total dollars of 
claims has pgf Ps(z) = Pi [P^i (^)]? where Ps ± (z) = P 2 [Px{z)\ is a compound 
ETNB pgf. We will first compute the distribution of S\. We have (in monetary 
units of 10) /x(0) = 0.3, /x(l) = 0.5, and /x(2) = 0.2. In order to use the 
compound ETNB recursion, we start with 

fsM = P 2 lfx(0)l 

{l + /3[l-/x(0)]r r -(l + ^)- r 

' — l-(l + 0)~ r 

{l + 3(l-0.3)}-°- 2 -(l + 3)-°- 2 
= 1-(1 + 3)-°- 2 

*= 0.16369. 


The remaining values of fs 1 (a；) may be obtained from (6.14) with 5 replaced 
by Si- In this case we have a = 3/(l+3) = 0.75,6 = (0.2 —l)a = -0.6,po = 0 
and pi = (0.2) (3) / [(1 + 3) 0 . 2+1 — (1 + 3)] = 0.46947. Then (6.14) becomes 


[0.46947 - (0.75 - 0.6) (0)J f x (x) 

+ E^=i (0-75 — 0.6y/x) fx(y)fs x {x-y) 

= 1 — (0.75) (0.3) 

= 0.60577/xOc) +1.29032 史 (0.75 - 0.6|) fx(y)f Sl {x-y). 


fst(x) 





2(i) (0.31873)(0.18775) = 0.11968 ， 

2(|) (0.31873)(0.11968) + 2(1) (0.22002)(0.18775) = 0.12076, 
2(|) (0.31873)(0.12076) + 2(|) (0.22002)(0.11968) 

+2 (I) (0.10686)(0.18775) = 0.10090, 

2(i) (0.31873)(0.10090) + 2 (I) (0.22002)(0.12076) 

+2(|) (0.10686)(0.11968) + 2(|) (0.06692)(0.18775) 

0.08696. r 


This simple idea can be extended to higher levels of compounding by re¬ 
peatedly applying the same concepts. The computer time required to carry 
out two applications will be about twice that of one application of (6.14). 
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The first few probabilities are 

f Sl (l) = 0.60577(0.5) + 1.29032 [0.75 - 0.6 (i)] (0.5)(0.16369) - 
= 0.31873, 

f Sl (2) = 0.60577(0.2) + 1.29032 {[0.75-0.6 (|)] (0.5)(0.31873) 

+ [0.75 - 0.6 (I)] (0.2)(0.16369)} = 0.22002, 

/ Sl (3) = 1.29032 {[0.75 - 0.6 (|)] (0.5)(0.22002) 

+ [0.75 - 0.6 (§)] (0.2)(0.31873)} = 0.10686, 

/ S J4) = 1.29032 {[0.75 - 0.6 (|)] (0.5)(0.10686) 

+ [0.75-0.6 (I)] (0.2)(0.22002)} = 0.06692. 

We now turn to evaluation of the distribution of 5 with compound Poisson 

Pgf 

Ps(z) = Pi [Ps 1 ^)} = e A W- 1 ]. 

Thus the distribution 

{/ 5l (x);x = 0,1,2,...} 

becomes the “secondary” or “claim size” distribution in an application of the 
compound Poisson recursive formula. Therefore, 

M0)= 巧 ⑼ = e A ⑼― 1 ] = e A W°)-l] = e 2(0.16369-D = 0 18775 

The remaining probabilities may be found from the recursive formula 

2 x 

fsix) = ~Y1 - j /)> x = i,2,.... 

y=l 

The first few probabilities are 


\ —/ \ —/ \ —/ \—/ 
12 3 4 

/-\ /(V /-\ /-\ 

/s/s/s/S 


■ 
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distribution when A* is used as the Poisson mean. If P*(z) is the pgf of the 
aggregate claims using Poisson mean A*, then Ps(z) = [•?* ⑷] 2 ' Hence 
one can obtain successively the distributions with pgfs [P*(z)] 2 , [P* ⑷] 4 , 
[P*{z)] 8 ^.., [P*{z)] 2n by convoluting the result at each stage with itself. This 
requires an additional n convolutions in carrying out the calculations but in¬ 
volves no approximations. This procedure can be carried out for any frequency 


le analogous procedure starts with r* = r/2 n . For the binomial 
,the parameter m must be integer valued. A slight modification 
Let m* = [m/2 n \ when [*J indicates the integer part of function. 
convolutions are carried out, one still needs to carry out the cal- 


For compound frequency distributions ， 
needs to be closed under convolution. 


1 — 

■ 
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6.6.3 Numerical stability 

Any recursive formula requires accurate computation of values because .each 
such value will be used in computing subsequent values. Recursive schemes 
suffer the risk of errors propagating through all subsequent values and po¬ 
tentially blowing up. In the recursive formula (6.14)，errors are introduced 
through rounding or truncation at each stage because computers represent 
numbers with a finite number of significant digits. The question about stabil¬ 
ity is, “How fast do the errors in the calculations grow as the computed values 
are used in successive computations? 3, 

The question of error propagation in recursive formulas has been a sub¬ 
ject of study of numerical analysts. This work has been extended by Panjer 
and Wang [104] to study the recursive formula (6.14). The analysis is quite 
complicated and well beyond the scope of this book. However, some general 
conclusions can be made here. 

Errors are introduced in subsequent values through the summation 
f ( a +?) 油侃工 — y ) 

in recursion (6.14). In the extreme right-hand tall of the distribution of S ， 
tills sum is positive (or at least nonnegative), and subsequent values of the 
sum will be decreasing. The sum will stay positive, even with rounding errors, 
when each of the three factors in each term in the sum is positive. In this case, : 
the recursive formula is stable, producing relative errors that do not grow fast. 
For the Poisson and negative binomial based distributions, the factors in each 
term are always positive. 

On the other hand, for the binomial distribution, the sum can have negative 
terms because a is negative, b is positive, and y/x is a positive function not 
exceeding 1. In this case, the negative terms can cause the successive values to 
blow up with alternating signs. When this occurs, the nonsensical results are 
immediately obvious. Although this does not happen frequently in practice, 
the reader should be aware of this possibility in models based on the binomial 
distribution. 

6.6.4 Continuous severity 

The recursive method has been developed for discrete severity distributions, 
while it is customary to use continuous distributions for severity. In the case of 
continuous severities, the analog of the recursion (6.14) is an integral equation, 
the solution of which is the aggregate claims distribution. 

Theorem 6.19 For the (a, 6, 1) class of frequency distributions and any con¬ 
tinuous severity distribution with probability on the positive real line, the fol' 









.They consider the more general (a, 6, m) class of 
for arbitrary modification of m initial values of the 
le initial term is pifx(^), not [pi — （a + b)po]fx(x) 
holds for members of the (a, 6,0) class as well, 
le form (6.19) are Volterra integral equations of the 




been 







proDaDinty Detween -r 丄厂 t ana j/i ana assigns it 
effect, rounds all amounts to the nearest convenient 
an of the distribution. 

licates that discrete probability at x should not be included, 
this will make no difference. 
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6.6.5.2 Method of local moment matching In this method we construct au 
arithmetic distribution that matches p moments of the arithmetic and the true 
severity distributions. Consider an arbitrary interval of length ph, denoted 
by [xk：Xk +ph). We will locate point masses … ,rrip at points 

Xk + h, … ,Xk 4-p/i so that the first p moments axe preserved. The system of 
p + 1 equations reflecting these conditions is 


P pXk+ph—O 

J](x k +jhym^= / x r dF x (x), r = 0,1,2, 

义 —n Jxfe — Q 


( 6 . 20 ) 


where the notation “ 一 0 ” at the limits of the integral indicates that discrete 
probability at Xk is to be included but discrete probability at x k +ph is to be 
excluded. 

Arrange the intervals so that Xk+i = Xk+ph and so the endpoints coincide ‘ 
Then the point masses at the endpoints axe added together. With xq = 0, the ； 
resulting discrete distribution has successive probabilities: 


fo = /i = m i? 

/ p = mg + ml ， f p+1 = m 


/a = 饥 §， . * • 】 

/p +2 — m 25 * 


( 6 翊 


By summing (6.20) for all possible values of fc, with xq = 0, it is clear 
that the first p moments are preserved for the entire distribution and that 
probabilities add to 1 exactly. It only remains to solve the system of equations ^ 
( 6 . 20 ). 

Theorem 6.20 The solution of (6.20) is 

= / TT , k .^ dF x {x), j = 0,l,...,p. (6.22) 

3 J Xk -o ( J - 咖 

Proof: The Lagrange formula for collocation of a polynomial f(y) at points 

2/0,2/1， * • • ， 2 /n is 

i=o y3 yi 

Applying this formula to the polynomial f(y) = y r over the points Xk, Xk + ； 
h, … ， Xk+ph yields 


i=o u J 


Integrating over the interval [a;^, Xk +p/i) with respect to the severity distri¬ 
bution results in 


l>Xk+ph—Q 
/x fc - 0 


x r dF x (x) = ^2(x k + jh) r m^ 

i=o 
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the solution (6.22) preserves the first p 
□ 



f 0 = mg = 5e— 0 * 2 — 4 = 0.09365 ， 

fj = mt 1 +ml = 5e - 01 ⑵ - 2 ) - lOe - 0 - 1 (功 + Se - 0 . 1 ⑵ +2 ) • 

The first few values also axe given in Table 6.9. A more direct solution for 
matching the first moment is provided in Exercise 6.36. □ 

This method of local moment matching was introduced by Gerber and 
Jones [44] and Gerber[43] and further studied by Panjer and Lutek [103] for 
a variety of empirical and analytical severity distributions. In assessing the 
impact of errors on aggregate stop-loss net premiums (aggregate excess-of-loss 
pure premiums), Panjer and Lutek [103] found that two moments were usually 
sufficient and that adding a third moment requirement adds only marginally 
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Table 6.9 Discretization of the exponential distribution by two methods 


3 

fj rounding 

fj matching. 

0 

0.09516 

0.09365 

1 

0.16402 

0.16429 

2 

0.13429 

0.13451 

3 

0.10995 

0.11013 

4 

0.09002 

0.09017 

5 

0.07370 

0.07382 

6 

0.06034 

0.06044 

7 

0.04940 

0.04948 

8 

0.04045 

0.04051 

9 

0.03311 

0.03317 

10 

0.02711 

0.02716 


6.6.6 Exercises 

6.36 Show that the method of local moment matching with k = l (matching 
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6.38 A weighted 


of two Poisson distributions 

e~ Al A^ , .. 、 e- A2 A 含 


has been used by some authors e.g., Tr5bliger [130] to treat drivers as either 
“good” or “bad” (see Example 4.64). 

(a) Find the pgf Pn(z) of the number of losses in terms of the two pgfs 
P^z) and P 2 {z) of the number of losses of the two types of drivers. 

(b) Let fx(x) denote a severity distribution defined on the nonnegative 
integers. How can (6.16) be used to compute the distribution of 
aggregate claims for the entire group? 

(c) Can this be extended to other frequency distributions? 

6.39 A compound Poisson aggregate loss model has five expected claims per 
" ’ *s defined on positi T '^ 一 “ ■** 

■e -5 , determine f) 




the maximum possible single loss. 

(a) Show that for the compound distribution the following backward 
recursion holds: 


Mx + Mj-EiilU^ + b ： 


fx(M- y)fs(x + y) 


為) 綱 


Table 6.11 Data for Exercise 6.40 


6 

7 
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(b) For the binomial (m, q) frequency distribution, how can the above 
formula be used to obtain the distribution of aggregate losses? See 
Panjer and Wang [104]. 

6.42 Aggregate claims are compound Poisson with 入 = 2 ， fx(i) = 組己 
fx(2) = For a premium of 6 an insurer covers aggregate claims and agrees 
to pay a dividend (a refund of premium) equal to the excess, if any, of 75% of 
the premium over 100% of the claims. Determine the excess of premium over 
expected claims and dividends. 

6.43 On a given day, a physician provides medical care to Na adults and N 。 
children. Assume Na and Nc have Poisson distributions with parameters 
3 and 2, respectively. The distributions of length of care per patient axe as 
follows: 



Adult 

Child 

1 hour 

0.4 

0.9 

2 hour 

0.6 

0.1 

Nc, and the lenj 

gths of care for all indi 

viduals be independent. 


that the office income on a ； 


day is less than or equal to 800. 


6.44 A group policyholder’s aggregate claims, 5, has a compound Poisson 
distribution with A = 1 and all claim amounts equal to 2. The insurer pays 
the group the following dividend: 

D -{ 6-S ， S<6, 

^"10, 5 >6. 

Determine E[D]. 

6.45 You axe given two independent compound Poisson random variables Si 
and 52, where fj(x), j = 1,2, are the two single-claim size distributions. You 
axe given Ai = A 2 = 1, /1 ⑴ =1, 紐 d /2(1) = / 2 ⑶ == 0.5. Let Fx{x) 
be the single-claim size distribution function associated with the compound 
distribution 5 = 5i + 知 Calculate ❾ 4 ⑹. 

6.46 The variable S has a compound Poisson claims distribution with the 
following: 


1. Individual claim amounts equal to 

2. E(S) = 56. 

3. Var(5) = 126. 
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Table 6.12 Data for Exercise 6.48 


Deductible 

Net premium 

4 

0.20 

5 

0.10 

6 

0.04 

7 

0.02 


4. A =： 29. 

Determine the expected number of claims of size 2. 

6.47 For a compound Poisson distribution with positive integer claim amounts, 
the probability function follows: 

fs(x) = — [O.I6/5(x 一 1) + kfs(x — 2) 4* 0.72/5(0; — 3)], a; = 1,2,3,. . . . 

The expected value of aggregate claims is 1.68. Determine the expected mim- 
ber of claims. 

6.48 For a portfolio of policies you are given the following: 

1. The number of claims has a Poisson distribution. 

2. Claim amounts can be 1, 2, or 3. 

3. A stop-loss reinsurance contract has net premiums for various deductibles 
as given in Table 6.12. 

* Determine the probability that aggregate claims will be either 5 or 6. 

6.49 For group disability income insurance, the expected number of disabil¬ 
ities per year is 1 per 100 lives covered. The continuance (survival) function 
for the length of a disability in days, Y, is 

Pr(F > y) = 1 - 畚 ， y = 0,1 ， … ， 10. 

The benefit is 20 per day following a five-day waiting period. Using a com¬ 
pound Poisson distribution, determine the variance of aggregate claims for a 
group of 1,500 independent lives. 

6.50 A population has two classes of drivers. The number of accidents per 
individual driver has a geometric distribution. Fora driver selected at random 
from Class I, the geometric distribution parameter has a uniform distribution 
over the interval (0,1). Twenty-five percent of the drivers axe in Class I. All 
drivers in Class II have expected number of claims 0.25. For a driver selected 
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,t random from this population, determine the probability of exactly two 
,ccidents. 

Note: The following two Exercises require the use of a computer. 

>.51 A policy covers physical damage incurred by the trucks in a company’s 
[eet. The number of losses in a year has a Poisson distribution with A = 5. 


6 = 2,500. The insurance ， 
Determine the probability 
of 100 and the method of 




6.52 An individual has purchased health insurance for which he pays 10 for 
each physician visit and 5 for each prescription. The probability that a pay¬ 
ment will be 10 is 0.25, and the probability that it will be 5 is 0.75. The 
total number of payments per year has the Poisson—Poisson (Neyman Type 
A) distribution with Ai = 10 and 入 2 = 4. Determine the probability that to¬ 
tal payments in one year will exceed 400. Compare your answer to a normal 
approximation. 

6.53 Demonstrate that if the exponential distribution is discretized by the 
method of rounding, the resulting discrete distribution is a ZM geometric 
distribution. 


6.7 THE IMPACT OF INDIVIDUAL POLICY MODIFICATIONS ON 
AGGREGATE PAYMENTS 


In Section 5.6 the manner in which individual deductibles (both ordinary and 



probability v. The individual ground-up lot 
icy modifications (including deductibles) app 
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made. Individual payments may then be viewed on a per-loss basis, where 

,i - _ 丄 _u 八 4 •/ K-*r "VL -nr：]! n if InGQ rpsmit.s in 



Y l given that Y L > 0. Notationally, we write Y p = Y L \Y L > 0. Therefore, 
the cumulative distribution functions are related by 


F y l (y) = (1 一 v) + vF Y p[y )，y > 0, 

because l-v = Pr(y L = 0) = F y l (0) (recall that Y L has a discrete proba¬ 
bility mass point 1-v at 0, even if X and hence Y p and F L have continuous 
probability density functions for y > 0). The moment generating functions of 
Y l and Y p axe thus related by 

My L{t) -(1-v) + vMy P {t\ (6.23) 


which may be restated in terms of expectations as 

E(e tyL ) = E{e tYL \Y L = 0)Pv(Y L = 0) + E(e tvL \Y L > 0)Pv (Y L > 0). 

It follows from Section 5.6 that the number of losses N L and the number 
of payments N p axe related through their probability generating functions by 

P N p {z) = P n l (1 -u + uz), (6.24) 

where P n p{z) = and P n l (z)=E (〆)• 

We now turn to the analysis of the aggregate payments. On a per loss 
basis, the total payments may be expressed as S = -i- + • • • + Y^ L 

with S = 0 i£ N L = 0 and where is the payment amount on the jth loss. 
Alternatively, ignoring losses on which no payment is made, we may express 
the total payments on a per-payment basis as 5 = + Yf + * * • + Yn p 

with S = 0 if iV p = 0, and Yf is the payment amount on the jth loss, which 
results in sl nonzero payment. Clearly, S may be represented in. two distinct 
ways on an aggregate basis. Of course, the moment generating function of S 
on a per-loss basis is 

Ms{t) == E (e t5 ) = [Myx. (t)], (6.25) 

whereas on a per-payment basis we have 

Ms(t) = E (e t5 ) == Pnp [My p (£)]. (6.26) 

Obviously, (6.25) and (6.26) are equal, as maybe seen from (6.23) and (6.24). 
That is, 

PpfL \MyL{t)\ = PjsfL [1 一 1； + vMyP (t)] = pN p [My P (t)] . 
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Consequently, any analysis of the aggregate payments <5 may be done on 
either a per-loss basis [with compound representation (6.25) for the moment 
generating function] or on a per payment basis [with (6.26) as the compound 
moment generating function]. The basis selected should obviously be deter¬ 
mined by whatever is more suitable for the particular situation at hand. While 
by no means a hard-and-fast rule, the authors have found it more convenient 


l Section 5.5 for the individual mean and variance axe 
[the mean and variance of the aggregate payments S n 
laese and (6.6) but with N replaced by N L and X by 
the other hand, if the (approximated) distribution of 
payment basis is normally to be preferred. The reason 




_ 丽搬 腿 iwjp j 


of zero probabilities in the distribution of Y L , particularly if a franchise de¬ 
ductible is employed. Also, for convenience, we normally elect to apply policy 
modifications to the individual loss distribution first and then discretize (if 
necessary), rather than discretizing and then applying policy modifications to 
the discretized distributions. This issue is only relevant if the deductible and 
policy limit are not integer multiples of the discretization span, however. The 
following example illustrates these ideas. 

Example 6.22 The number of ground-up losses is Poisson distributed with 
mean A = 3. The individual loss distribution is Pareto with parameters a = 4 
and 9 = 10. An individual ordinary deductible of 6, coinsurance of 75%, 
and an individual loss limit of 24 (before application of the deductible and 
coinsurance) are all applied. Determine the mean, variance, and distribution 
of aggregate payments. 

We will first compute the mean and variance on a per-loss basis. The mean 
number of losses is E(N L ) = 3, and the mean individual payment on a per 
loss basis is (using Theorem 5.13 with r = 0 and the Pareto distribution) 

E(F L ) = 0.75 [E{X A 24) - E(XA6)] = 0.75(3.2485 — 2.5195) = 0.54675. 

The mean of the aggregate payments is thus 

E(S) = E{N l )E(Y l )= ⑶ (0.54675) = 1.64. 

The second moment of the individual payments on a per-loss basis is, using 
Theorem 5.14 with r = 0 and the Pareto distribution, 

E[{Y l ) 2 ] = (0.75) 2 {Egf 八 24) 2 ]-E[(XA6) 2 ] 

-2(6)E(X A 24) + 2(6)E(X A 6)} 

=(0.75) 2 [26.3790 - 10.5469 — 12(3.2485) + 12(2.5195)] 
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In order to compute the variance of aggregate payments, we do not need 
to explicitly determine Var(y L ) because S is compound Poisson distributed, 
which implies [using (4.27], for example) that 

Vax(5) = AE [(y L ) 2 ] = 3(3.98481) = 11.9544 = (3.46) 2 . 

In. order to compute the (approximate) distribution of 5, we will use the per 
payment basis. First note that v = Pr(X > 6) = [10/(10 + 6)] 4 = 0.15259, 
and the number of payments N p is Poisson distributed with mean E(N P )= 
Av = 3(0.15259) = 0.45776. Let Z = X-6\X >6, so that Z is the individual 
payment random variable with only a deductible of 6. Then 


Pr (X>6T* 

With coinsurance of 75%, Y p = 0.75Z has cumulative distribution : 


F Y p{y) = 1 — Pr(0.75Z > y)= 


Pr(X>6 + y/0.75) 
Pr(X > 6) . 


That is, for y less than the maximum payment of (0.75)(24 - 6) = 13.i 
„ , 、 Pr(X >6) - Pr(X > 6 + y/0.75) ^ 10K 


and F Y p(y) = 1 for y > 13.5. We then 


the distribution of Y p 


care must be exercised in evaluation of /g, and we have /g = J^y-p (14.625) — 
^ (12-375) = 1 - 0.94126 = 0.05874. Then /„ = 1 - 1 = 0 for n = 7,8,. … 
The approximate distribution of S may then be computed using the compound 
Poisson recursive formula, namely, go = e - o . 45776 。 - 0 . 30124 ) = 0.726 25, and 


0.45776^ _ 7 n 0 Q 

9k = — 7 — > 一 j ，« = 1 ， 2, d，• *. 


Thus, = (0.45776)(1)(0.32768)(0.72625) = 0.10894, for example. □ 

6.7.1 Exercises 

6.54 Suppose that the number of ground-up losses N L has probability gen- 
.. " Pn^( z ) = -B [6{z - 1)], where 0 is a parameter and B is 

pendent of 0. The individual ground-up loss distribution is 
cumulative distribution function Fx{x) = 1 — e— ㈣ ， x >0. 
axe subject to an ordinary deductible of d and coinsurance of 
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а. Demonstrate that the aggregate payments, on a per-payment basis, have 
compound moment generating function given by (6.26)，where N p has the 
same distribution as N L but with 0 replaced by 8e^ d and Y p has the same 
distribution as X but with " replaced by fi/a. 

б. 55 A ground-up model of individual losses has the gamma distribution 
with parameters o ； = 2 and 6 = 100. The number of losses has the negative 
binomial distribution with r = 2 and /3 = 1.5. An ordinary deductible of 50 
and a loss limit of 175 (before imposition of the deductible) are applied to 
each individual loss. 

(a) Determine the mean and variance of the aggregate payments on a 
per-loss basis. 

(b) Determine the distribution of the number of payments. 

(c) Determine the cumulative distribution function of the amount Y p 
of a payment given that a payment is made. 

(d) Discretize the severity distribution from (c) using the method of 
rounding and a span of 40. 

(e) Use the recursive formula to calculate the discretized distribution 
of aggregate payments up to a discretized amount paid of 120. 

y v 6.8 CALCULATIONS WITH APPROXIMATE DISTRIBUTIONS 

Whenever the severity distribution is calculated using an approximate method, 
the result is, of course, an approximation to the true aggregate distribution. 
In particular，the true aggregate distribution is often continuous (except, per¬ 
haps, with discrete probability at zero or at an aggregate censoring limit) while 
the approximate distribution either is discrete with, probability at equally 

spaced values as with recursion and Fast Fourier Transform (FFT)，is discrete 
with probability 1/n at arbitrary values as with simulation, or has a piecewise 
linear distribution function as with Heckman-Meyers. In this section we in¬ 
troduce reasonable ways to obtain values of Fs(x) and E[(5 Ax) k ] from those 
approximating distributions. In all cases we assume that the true distribution 
of aggregate payments is continuous, except perhaps with discrete probability 
at 5 = 0. 

6.8.1 Arithmetic distributions 

For recursion and the FFT, the approximating distribution can be written 
as po ， pi， …， where p'j = Pt{S* = jh) and S* refers to the approximating 
distrilDution. While several methods of undiscretizing this distribution are 
possible, we will introduce only one. It assumes we can obtain go = Pr(5 = 0), 
the true probability that aggregate payments are zero. The method is based 


Table 6.13 Discrete approximation to the aggregate payments distribution 


3 

X 

fx(X) 

Pj = fs- {x) 

0 

0 

0.009934 

0.335556 

1 

2 

0.019605 

0.004415 

2 

4 . 

0.019216 

0.004386 

3 

6 

0.018836 

0.004356 

4 

8 

0.018463 

0.004327 

5 

10 

0.018097 

0.004299 

6 

12 

0.017739 

0.004270 

7 

14 

0.017388 

0.004242 

8 

16 

0.017043 

0.004214 

9 

18 

0.016706 

0.004186 

10 

20 

0.016375 

0.004158 


on constructing a continuous approximation to 5* by assuming the probability 

pj is uniformly spread over the interval (j — ^)h to (j + for j = 1,2, - 

For the interval from 0 to /i/2, a discrete probability of go is placed at zero 
and the remaining probability, p 0 - go, is spread uniformly over the interval. 
Let S** be the random variable with this mixed distribution. All quantities 
of interest are then computed using S**. 

Example 6.23 Let N have the geometric distribution with (3 = 2 and let X 
have the exponential distribution with 0 = 100. Use recursion with a span 
of 2 to approximate the aggregate distribution and then obtain a continuous 
approximation. 

The exponential distribution was discretized using the method which pre¬ 
serves the first moment. The probabilities appear in Table 6.13. Also pre¬ 
sented axe the aggregate probabilities computed using the recursive formula. 
We also note that 如 =Pr(iV = 0) = (1 + /?) 一 1 = For jf = 1 ， 2,… tlie con¬ 
tinuous approximation has pdf fs mm (^) = fs m (2j)/2, 2j — 1 < x <2j 1. We 

also have Pr(5^ = 0) = | and / 5 ..(x) = (0.335556 - *)/1 = 0.002223, 0 < 
x<l. □ 

Returning to the original problem, it is possible to work out the general 
formulas for the basic quantities. For the cdf, 

a " ⑷ = 9o+J o ^jr ds 

=go 4 - -^(po —^o) 5 0 < x < —, 








and 


= I ： p i + x ~ u - 1/2)h 1 


For the limited expected value (LEV), 


E[{S^Ax) k ] = 0~ o 


… J 0 

2x k ^(p 0 -g 0 ), 

h{k + 1 ) 十 1 


x k [l-F s -(x)], 0<x<!^, 


E[(S**Ax) k ] = 0 k g o 


' 夺 ds 4 - a; fc [l — .Fs— (a;)] 


_ (h/2) k {p 0 - g 0 ) , + 1/2 )叫 — (i- l/2) fc +!] 

~~ k ^ Pi 

x k + l ~[(j-l/2)h] k + l 
+ h{k + 1) Pj 


+x fc [l-F s --(x)] ; 


For k = 1 this reduces to 


®(l-5o)-y(Po-5o), 0<x<~, 

E(5**Ax) = <j j{po~ go) + g ih Pi + — ~ ^ 1/2)/l]2 pj 

+x[l - F s ..(x)], (j - I) ft < a: £ (j. + 去 ) 


These formulas are summarized in Appendix E. 

Example 6.24 (Example 6.23 continued) Compute the cdf and LEV at inte¬ 
gral values from 1 to 10 using S* ， S**, and the exact distribution of aggregate 
losses. 


The exact distribution is available for this example. It was developed in 
Example 6.12 where it was determined that Pr(5 = 0) = (1 + /3) -1 = | and 
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Table 6.14 Comparison of true aggregate payment values and two approximations 


i pdf for the continuous part is 

/S(X ) = 0(l+^)2 eXP [~'t 

皿 this we have 


F s{x) = I 


E(SAx)=[ 
Jo 

squested values 


/f 盖 e_s/300 ds + X i e_x/300 = 200(1 

are given in Table 6.14. 


6.8.2 Empirical distributions 

When the approximate distribution is obtained by simulation (the simulation 
process is discussed in Chapter 17), the result is an empirical distribution. Un¬ 
like approximations produced by recursion or the FFT, simulation does not 
place the probabilities at equally spaced values. This makes it less clear how 
the approximate distribution should be smoothed. On the other hand, simu¬ 
lation usually involves tens or hundreds of thousands of points, and therefore 
the individual points are likely to be close to each other. For these reasons it 





Table 6.15 Simulated values of aggregate losses 


E[(5 * 八养二 


-x k [l-Fs^(x)]. 


Example 6.25 (Example 6.23 continued) Simulate 1,000 observations from 
the compound model with geometric frequency and exponential severity. Use 達 
the results to obtain values of the cdf and LEV for the integers from 1 to 10. 
The small sample size was selected so that only aboutSO values between zero 
and 10 (not including zero) are expected. 

The simulations produced an aggregate payment of zero 331 times. The set 
of nonzero values that were less than 10 plus the first value past 10 axe pre- 
… 一 ” ^ 15. Other than zero, none of the values appeared more than 
ition. The requested values from the empirical distributioii^ 
ie values axe given in Table 6.16. □:: 
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Table 6.16 Empirical and smoothed values from a simulation 

Fs4x) FW) E(5* A x) E(5Aa;) 



no prooaDimy 絲 

smoothed distribution is to connect these points with straight lines. Let 5 矿矿 
be the random variable with this particular cdf. Intermediate values of the 
cdf of are found by interpolation. 

The formula for the limited expected value is (for xj-i <x< Xj) 


E[(5 ## A x) fe ] = JZ 


ds + i fc [l - F s ## (x)] 


^ (fc + lXii-ii—O 

+ (fc + l){xj - Xj-i) 

+x k L _ (gjl + (Xj - x)Fh 


E (，十 

2=2 

, - x i-i) F j + ( x -i ~ x ) F j-i] 
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6.8.4 Exercises 

6.56 Let the frequency (of losses) distribution be negative binomial with 
r = 2 and /? = 2. Let the severity distribution (of losses) have the gamma 
distribution with a- = 4 and 0 = 25. Determine ^(200) and E(5 A 200) for 
an ordinary per-loss deductible of 25. Use the recursive formula to obtain the 
aggregate distribution and use a discretization interval of 5 with the method 
of rounding to discretize the severity distribution. 

6.57 (Exercise 6.51 continued) Recall that the number of claims has a Pois¬ 
son distribution with 入 = 5 and the amount of a single claim has a gamma 
distribution with a = 0.5 and 6 = 2,500. Determine the mean, standard devi¬ 
ation, and 90th percentile of payments by the insurance company under each 
of the following coverages. Any computational method may be used. 

(a) A maximum aggregate payment of 20,000. 

(b) A per-claim ordinary deductible of 100 and a per claim maximum 
payment of 10,000. There is no aggregate maximum payment. 

(c) A per-claim ordinary deductible of 100 with no maximum payment. 
There is an aggregate ordinary deductible of 15,000, an aggregate 
coinsurance factor of 0.8, and a maximum insurance payment of 
20,000. This corresponds to an aggregate reinsurance provision. 

6.58 (Exercise 6.52 continued) Recall that the number of payments has the 
Poisson—Poisson distribution with Ai = 10 and A 2 = 4 while the payment per 
claim by the insured is 5 with probability 0.75 and 10 with probability 0.25. 
Determine the expected payment by the insured under each of the following 
situations. Any computational method may be used. 

(a) A maximum payment of 400. 

(b) A coinsurance arrangement where the insured pays 100% up to an 
aggregate total of 300 and then pays 20% of aggregate payments 
above 300. 


、人 6.9 INVERSION METHODS 

Inversion methods discussed in this section are used to obtain numerically 
the probability function, or some related function such as a net stop-loss 
premium (aggregate excess-of-loss pure premium), from a known expression 
for a transform, such as the pgf, mgf, or cf of the desired function. 

Compound distributions lend themselves naturally to this approach be¬ 
cause their transforms are compound functions and are easily evaluated when 
both frequency and severity components are known. The pgf and cf of the 
aggregate loss distribution axe 

P s (z) = P N [Px(z)} 


and 
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中 s( z ) = E[e lSz ] = Pjv[<p x (z)], (6.28) 

respectively. The characteristic function always exists and is unique. Con¬ 
versely, for a given characteristic function, there always exists a unique dis¬ 
tribution. The objective of inversion methods is to obtain the distribution 
numerically from the characteristic function (6.28). 

It is worth mentioning that there has recently been much research in other 
areas of applied probability on obtaining the distribution numerically from 
the associated Laplace-Stieltjes transform. These techniques axe applicable 
to the evaluation of compound distributions in the present context but will 
not be discussed further here. A good survey is [2], pp. 257-323. 



6.9.1 Fast Fourier transform 

The FFT is an algorithm that can be used for inverting characteristic functions 
to obtain densities of discrete random variables. The FFT comes from the 
field of signal processing. It was first used for the inversion of characteristic 
functions of compound distributions by Bertram [14] and is explained in detail 
with applications to aggregate loss calculation by Robertson [111]. 


Definition 6.26 For any continuous function f(x) } the Fourier transform 
is the mapping 

f(z)= 「 f(x)e izx dx. (6.29) 

J—oo 

The original function can be recovered from its Fourier transform as 
⑽ = iJ:J 咖- izxdz - 

When f(x) is a probability density function, f(z) is its characteristic func¬ 
tion. For our applications, /(x) will be real valued. Prom (6.29), f(z) is 
complex valued. When /(一 is 汪 probability function of a discrete (or mixed) 

distribution, the definitions can be easily generalized (see, for example, Fisz 

[38]). 

Definition 6.27 Let f x denote a function defined for all integer values of x 
that is periodic with period length n (that is ， f x ^. n — fx for all x). For the 
vector (/ o ，/ i ，...，/ n - i )， the discrete Fourier transform is the mapping 
fx, x = ...,-1,0,1,..., defined by 


/a ； = ^ fj expf ~jk j , fc = … ， 一 1,0,1, … • 
i=o \ / 


(6.30) 
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This mapping is bijective. In addition，fk is also periodic with period length 
n. The inverse mapping is 

/j = ^ A exp > J = ...,-1,0,1, - (6.31) 

k=Q 、 ’ 

This inverse mapping recovers the values of the original function. 

Because of the periodic nature of / and /, we can think of the discrete 
Fourier transform as a bijective mapping of n points into n points. From 
(6.30), it is clear that, in order to obtain n values of / fc , the number of terms 
that need to be evaluated is of order n 2 , that is, 0(n 2 ). 


even-numbered points and the second consisting of the odd-numbered 


= 


= + n{ £ f 2j+1 eJ 2 -^{2 j + l)k] 
j=0 \ n ’ 3=0 L J 

=^ exp (^jkj+exp 


m = n/2. Hence 


A = /fe+expf^fe) ft- 


can, in turn, be written as the sum of two transforms of length m/2. 
: an be continued successively. For the lengths n/2, m/2, ... to be 


result, after r times, in transforms of length 1. Knowing the transform of 
length 1 will allow one to successively compose the transforms of length 2, 
2 2 , 2 3 ,,.. ， 2 r by simple addition using (6.32). Details of the methodology axe 
found in Press et al. [107]. 

In our applications, we use the FFT to invert the characteristic function 
when discretization of the severity distribution is done. This is carried out as 
follows: 
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1. Discretize the severity distribution using some methods such as those 
described in the previous section, obtaining the discretized severity dis¬ 
tribution 

/x(0), /x ⑴， … ， /x(n - 1 )， 


in the distribution fs{x) of aggregate claims. 

2. Apply the FFT to this vector of values, obtaining the charac¬ 

teristic function of the discretized distribution. The result is also a 
vector of n = 2 r values. 

3. Transform this vector using the pgf transformation of the claim fre¬ 
quency distribution, obtaining (fis{z) = Pn[Px( z )]^ which is the charac¬ 
teristic function, that is, the discrete Fourier transform of the aggregate 


gives a vector of length n = 2 r vi 
of aggregate claims for the disci 

The FFT procedure requires a discretization of the severity distribution. 
When the number of points in the severity distribution is less than n = 2 r , 
the severity distribution vector must be padded with zeros until it is of length 


When the severity distribution pla 
as is the case with most distributions 
that is missed in the right-hand tail b 
in the final solution because the funct 
to be periodic with period n, when in i 
putting all the remaining probability 
probabilities add up to 1 exactly. T] 
the severity distribution in the FFT ； 
of aggregate probabilities will sum to 1. However, it is imperative that n be 
selected to be large enough so that most all the aggregate probability occurs 
by the nth point. The following example provides an extreme illustration. 

Example 6.28 Suppose the random variable X takes on the values 1, 2, and 
3 with probabilities 0.5, 0.4, and 0.1, respectively. Further suppose the number 
of claims has the Poisson distribution with parameter A = 3. Use the FFT to 
obtain the distribution of S using n = 8 and n = 4,096. 

In either case, the probability distribution of X is completed by adding 
one zero at the beginning (because S places probability at zero, the initial 
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Table 6.17 

Aggregate probabilities computed 

by the FFT and IFFT 


n = 8 

n = 4,096 

s 

fs(s) 

fs{s) 

0 

0.11227 

0.04979 

1 

0.11821 

0.07468 

2 

0.14470 

0.11575 

3 

0.15100 

0.13256 

4 

0.14727 

0.13597 

5 

0.13194 

0.12525 

6 

0.10941 

0.10558 

7 

0.08518 

0.08305 


appear in Table 6.17. For the case n = 8, the eight probabilities sum to 1. For 
' - also sum to 1, but there is not room here 




decimal places presented. On the other hand, with n = 8, the FFT gives 
values that axe clearly distorted. If any generalization can be made, it is that 
more of the extra probability has been added to the smaller values of 5. □ 

Because the FFT and IFFT algorithms are available in many computer 
software packages and because the computer code is short, easy to write, 
and available (e.g., [107], pp. 411-412), no further technical details about the 
algorithm axe given here. The reader can read any one of numerous books 
dealing with FFTs for a more detailed understanding of the algorithm. The 
technical details which allow the speeding up of the calculations from 0{n 2 ) to 
0(nln 2 n) relate to the detaUed properties of the discrete Fourier transform. 
Robertson [111] gives a good explanation of the FFT as applied to calculating 
the distribution of aggregate claims. 

6.9.2 Direct numerical inversion 

The inversion of the characteristic function (6.28) has been done using ap¬ 
proximate integration methods by Heckman and Meyers [51] in the case of 
Poisson, binomial, and negative binomial claim frequencies and continuous 
severity distributions. The method is easily extended to other frequency dis¬ 
tributions. ♦ 

In this method, the severity distribution function is replaced by a piecewise 
linpflr Histribntion. It further uses a maximum single-loss amount so the cdf 


severity distribution Fx{x) % 0 < a: < oo. Let 0 = rco < 怎 1 be 
arbitrarily selected loss values. Then the probability that losses lie in the 
interval {xk-i,Xk\ is given by fk = Fx(xk) — Fx(xk^i)- Using a uniform 
density dk over this interval results in the approximating density function 
f*(x) = d k = fk/(xk - Xfc 一 i) for x k ^i <x< x k . Any remaining probability 
f n+1 = 1 - Fx(x n ) is placed as a spike at x n . This approximating pdf 
is selected to make evaluation of the cf easy. It is not required for direct 
inversion. The cf of the approximating severity distribution is 
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I deviation of the distribution of aggregate losses. Ap- 
一 for any value 
details. They 

^Iso obtain the net stop-loss (excess pure) premium for the aggregate loss 
distribution as 


E[{S-d)+] = f(s - d) dF s {s) 


from (6.33), where /i is the mean of the aggregate loss distribution and d is 

the Equation(6.33) provides only a single value of the distribution, while (6.34) 
provides only one value of the premium, but it does so quickly. The error o 
approximation depends on the spacing of the numerical integration method 
but is controllable. 

6.9.3 Exercise 

6.59 Repeat Exercises 6.51 and 6.52 using the inversion method. 


6.10 COMPARISON OF METHODS 

The recursive method has some significant advantages. Thetime requiredto 
compute an entire distribution of n points is reduced to 0(n ) from 0(n ) 
the direct convolution method. Furthermore, it provides exact values when 
..uu — if The only source ot 


: 二：：她 severity'distribution. Except for bino^ 
models, the calculations axe guaranteed to be numerically stable. Th^method 
is very easy to program in a few lines of computer code However, it has a few 
disadvantages. The recursive method only works for the classes of f^eque^y 
distributions described in Section 4.6. Using distributions not based on the 
fa b, 0) and (a, b, 1) classes requires modification of the formula or developing^ 
a new recursion. Numerous other recursions have recently been developed m ：： 
the actuarial and statistical literature. ^ 

The FFT method is easy to use in that it uses standard routmes^vailabie 
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be of order n, rather than n 2 in the case of no upper limit of the severity. 
The FFT method can be extended to the case where the severity distribution 
can take on negative values. Like the recursive method, it produces the entire 
distribution. 

The direct inversion method has been demonstrated to be very fast in 
calculating a single value of the aggregate distribution or the net stop-loss 
(excess pure) premium for a single deductible d. However, it requires a major 
computer programming effort. It has been developed by Heckman and Meyers 
[51] specifically for (a, 6,0) frequency models. It is possible to generalize the 
computer code to handle any distribution with a pgf that is a> relatively simple 
function. This method is much faster than the recursive method when the 
expected number of claims is large. The speed does not depend on the size of 
入 in the case of the Poisson frequency model. In addition to being complicated 
to program, the method involves approximate integration whose errors depend 
on the method and interval size. 

Through the use of transforms, both the FFT and inversion methods are 
able to handle convolutions efficiently. For example, suppose a reinsurance 


frequency and severity distributions. If i = 1,2,3, are the aggregate 
losses for each group, the characteristic function for the total aggregate losses 
5 = 5i+5 2 + 5 3 is (p s (z) = (p Sl ( z ) l Ps 2 ( z ) l Ps 3 ( z ) 皿 d so the only extra work is 
some multiplications prior to the inversion step. The recursive method does 
not accommodate convolutions as easily. 

The Heckman-Meyers method has some technical difficulties when being 
applied to severity distributions that axe of the discrete type or have some 
anomalies, such as heaping of losses at some round number (e.g., 1,000,000). 
At any jump in the severity distribution function, a very short interval con- 
taming the jump needs to be defined in setting up the points (a ； i,a ； 2 , … ， x n ). 
♦ We save a discussion of simulation for last because it differs greatly from 
the other methods. For those not familiar with this method, an introduction is 
provided in Chapter 17. The major advantage is a big one. If you can carefully 
articulate the model, you should be able to obtain the aggregate distribution 
by simulation. The programming effort may take a little time but can be done 
in a straightforward manner. Today’s computers will conduct the simulation 
in a reasonable amount of time. Most of the analytic methods were developed 
asa response to the excessive computing time that simulations used to require. 
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does, the tail is extremely long (for example, a Pareto distribution with small 
a). The simulation will have to discard 99% of the generated losses and then 
will need a large number of those that exceed the deductible (due to the large 
variation in losses). It may take a long time to obtain a reliable answer. One 
possible solution for simulation is to work with the conditional distribution of 
the loss variable, given that a payment has been made. 

No method is clearly superior for all problems. Each method has both 
advantages and disadvantages when compared with the others. What we 
really have is an embarrassment of riches. Twenty-five years ago, actuaries 
wondered if there would ever be effective methods for determining aggregate 
distributions. Today we can choose from several. 


6.11 THE INDIVIDUAL RISK MODEL 
6.11.1 Parametric approximation 

The individual risk model represents the aggregate loss as a fixed sum of 
independent (but not necessarily identically distributed) random variables: 

5 = Xi + X 2 - + X n . 

This is usually thought of as the sum of the losses from n insurance con¬ 
tracts, for example, n persons covered under a group insurance policy. 

The individual risk model was originally developed for life insurance in 
which the probability of death within a year is 办 and the fixed benefit paid 
for the death of the jth. person is bj. In this case, the distribution of the loss 
to the insurer for the jth policy is 

r 〜 、一 J 1 一办， X = 0 ^ 
fXi{x) = { 9i ， x = bj. 

In this case the mean and variance of aggregate losses are 


E(5) = 


Vax(5) = ^ bjqjil - qj) 


because the IjS aie assumed to be independent. Then, the pgf of aggregate; 
losses is 

朽(和办 1 - 办 + 一). （6 . 35) 


P s (z) = [l + g( Z -l)]-, 

and in this case S has a binomial distribution. 

The individual risk model can be generalized as follows. Let Xj = IjBj, 
where Ji,..., J n , Bi, … ， B n are independent. The random variable Ij is an 
indicator variable that takes on the value 1 with probability qj and the value 
0 with probability l—qj. This variable indicates whether or not the jth policy 
produced a payment. The random variable Bj can have any distribution and 
represents the amount of the payment in respect of the jth policy given that 
a payment was made. In the life insurance case, Bj is degenerate, with all 
probability on the value bj. If we let fij = E(jBj) and = Va,v(Bj) then 

E(5) = qjfij (6.36) 

3=1 

and 打 

Vax(5) = + qj(l - qj)/^]- (6.37) 

i=i 

You axe asked to verify these formulas in Exercise 6.60. The following example 
is a simple version of this situation. 


Example 6.29 Consider a group life insurance contract with an accidental 
death benefit Assume that for all members the probability of death in the 
next year is 0.01 and that 30% of deaths are accidental. For 50 employees 



100,000. For the remaining 25 employees the benefits are 75,000 and 150,000, 
•respectively. Develop an individual risk model and determine its mean and 
variance. 


For all 75 employees qj = 0.01. For 50 employees, Bj takes on the value 
50,000 with probability 0.7 and 100,000 with probability 0.3. For them, fij = 
65,000 and — 525,000,000. For the remaining 25 employees Bj takes on 
the value 75,000 with probability 0.7 and 150,000 with probability 0.3. For 
them, ixj = 97,500 and a] = 1,181,250,000. Then 

E ㈣ = 50(0.01)(65,000) + 25(0.01)(97,500) 

= 56,875 

and 

Var(S) = 50(0.01)(525,000,000) + 50(0.01) (0.99) (65,000) 2 

+25(0.01)(1,181,250,000) + 25(0.01)(0.99)(97,500) 2 
= 5,001,984,375. 
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373,000 



Table 6.18 

Age 

(years) 

20~ 

23 

27 

30 

31 

46 

47 
49 
64 
17 
22 
26 
37 
55 



3 

4 

5 


8 

9 

10 
11 
12 

13 

14 

Total 


When the risks are different, the probabilities defined by pgf (6.35) can 
be computed exactly or approximately. A normal, gamma, lognormal, or aay 
other distribution can be used to approximate the distribution. This is usually 
done by matching the first few moments. Because the normal, gamma, and 
lognormal distributions each have two parameters, the mean and variance axe 
sufficient. 

Example 6.30 (Group life insurance) A small manufacturing business has 
a nroup life insurance contract on its 14 permanent employees. The actuary 




are the chances that it will lose money in a given year? Use the normal md 
lognormal approximations. 

The mean and variance of the aggregate losses for the group axe 


E(5) =yZbjqj = 2,054.41 
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Var'(S) = 5I^i(l-^) = 1.02534 x 10 8 . 
i=i 

The premium being charged is 1.45 x 2,054.41 = 2,978.89. For the normal 
approximation (in units of 1,000), the mean is 2.05441 and the variance is 
102.534. Then the probability of 汪 loss is 


Pr(5 > 2.97889) = Pt\Z> 


2.97889 - 2.054411 


丄 ; -“ [ (102.534)1/2 

- Pt{Z > 0.0913) 

= 0.46 or 46%. 

For the lognormal approximation (as in Example 6.5) 
/x+W 2 = In 2.05441 = 0.719989 


2fi + 2a 2 = ln(102.534 + 2.05441 2 ) = 4.670533. 
一 0.895289 and a 2 = 3.230555. Then 


Pr(5> 2.97889)= 


rin2.97889*+0.895289] 
[ (3.230555) 1 / 2 J 

— $(1.105) 

•13 or 13%. 


In the next subsection we present several ways of obtaining the exact dis¬ 
tribution of S for the case where the benefit amounts are fixed. 


6.11.2 Exact calculation of the aggregate distribution 

6.11.2.1 Direct calculation The pf of aggregate losses is given by 


fs{x) = f Xl * fxn*---* fx n {x), 


砂 )= {『 1 ’ 

The density (6.38) can be calculated recursively over the partial sums 
Sj-i + Xj for j = 2,3, ... ， n beginning with 5i = X\. Then 

, (x) _ // 的 -“ 紙⑼， X<bj, 

⑽ x) — I fs^(x)f Xj (0) + fsj^ (X - b^fx, (bj), X > bj, 

= f Pjfsi—Ax), X < bj, 

~ \ Pjfsj-A 1 ) + x ^ b i- 




umulative probabilities 
Fs(x) j 1 ~1 


0.96157969 

0.96157969 ! 

0.96157969 

0.96157969 

0.96621337 

0.96621337 

0.96998201 

0.96998201 


0.9721295U 

0.97330507 

0.97330747 

0.97331344 

0.97331962 


0.97335098 

0.97335892 

0.97338128 

0.97338740 

0.97340884 

0.97341351 

0.97342561 

0.97342840 

0.97343397 

0.97343866 

0.97345889 

0.97346040 

0.97346606 

0.97346608 

0.97347547 

0.97806678 

0.97807068 

0.97807536 

0.97807660 

0.97807808 


0.99933062 

0.99933187 

0.99933191 

0.99933193 

0.99933198 

0.99933202 

0.99933206 

0.99933209 

0.99933217 

0.99933450 

0.99934141 

0.99934796 

0.99935031 

0.99936659 

0.99937973 

0.99941735 

0.99944759 

0.99945823 

0.99953355 

0.99956734 


If we wish to calculate the distribution of total claims up to some value r, 
the computer time involved, as measured by the number of multiplications, 
can be seen to be of order nr. If both r and n axe large (e.g., r = 10,000 and 
n - 10,000), the number of computations can be prohibitive. 

Example 6.31 (Example 6.30 continued) Use the direct method to determine 
the pf of S as well as the probability required in Example 6.30. 


= 0.99851, 
fsi (15) = 0.00149, 

/ 灸⑼ = P W0) = 0.99709212 ， 

/s 2 (15) = p 2 /5i(15) = 0.00148788, 

fs 2 (16) = P 2 /S 1 (16) + 92 / 5 ^ ⑼ = 0.00141788 ， 

/ 52 (31) = P2/ Sl (31) + qoJ Sl (l5) = 0.0000021158. 

The final values of the cdf F s {x) for a: = 0, ... ， 79 axe given in Table 6^19. 
From Table 6.19, the probability of exceeding 2,978.89 is seen to be 0.047, 
which shows that both approximations in Example 6.30 axe poor. 口 


This approach is reasonable when n is not too large, but for larger groups ； 


6.11.2.2 
bution t 
portfolio. 

mj be the number of policies with benefit % (where i = 1,2,... ,r) 3 and cla: 
probability qj (where j = 1 ， 2,…’ m). Then the pgf of total claims may 
written as 


Ps(z) = 1111(1-^+^ 


The logarithm of the pgf is 


inP s {z) = - qj + qjz 1 ). 


We now differentiate (6.39) to obtain 


p’sb)=Ps{z) ^^ + 


Setting ^ = 1 in (6.40) yields the mean of the total claims distribution, nam< 


EC?) = 沒 1) 


Now, (6.40) may be rewritten as 


zP，s{z) = Ps{z) Sg inii (i^ 2i ) ( 1+ r^ zi ) 

= p si z ) zik ( 6 . 

i=l i=l fc=l v 


for | 2 | < min f>J [/y7 1 (1 一 qj)] 1 ^. The second term on the right-hand side 
(6.41) may be rewritten as 






3 As in the discretization of severity distributions, it is necessary that the benefit 
be in arithmetic progression. However, the monetary unit need not be 1. For 
i = 1,2,... could represent benefit amounts of 5,000, 10,000, .... 










198 AGGREGATE LOSS MODELS 


Thus, (6.41) may be written as 

zP'siz) = Ps(z) ( 6 - 43 ) 

ii=l k-1 」 

The coefficient of 2： 1 on the left-hand side of (6.43) is xfs{x), where fs{x) is 
the coefficient of z x in P s {z). The right-hand side of (6.43) is a convolution, 
and the coefficient of z x is thus given by 

Y, Ki,k)fs{x-ik). (6.44) 

ik<x 

A simpler way of writing (6.44) is 

x lx/i\ 

^2 h (^ k )fs(,^ - ik ) 

i=l fe=l 

where denotes the greatest integer function, that is, the largest integer 
that is less than or equal to the argument. Finally, because h{i,k) = 0 if 
i > x, one may equate coefficients of z 1 on both sides of (6.43) and divide by 
x to obtain .… 


x to obtain 

fs{x) 

-xAr [x/H 

= - h(i, k)fs(x - ik), x>l. 

^ i=\ k=l 

(6.45) 

Now, 

r m 


mo 卜巧⑼ siniGi)' 

i=i i=i 

(6.46) 

and from (6.45), 

/s(l)= 

h(l,l)f s (0), 


■M2)= 

l)f S {l) + [h(l,2) + h{2, l)]/s(0)}, 



The probabilities {/ s (*)； a: = 1,2,...} may be calculated recursively using 
(6.45), beginning with (6.46). 

It can be seen from (6.42) that h{i,k) is a weighted sum of the *th power, 
of qJil-qj), j = Wlien qj is close to zero, [qj/(l-qj)] is smaJI- 

Consequently, the magnitude of h(i, k) decreases rapidly as k increases. T^bis 
suggests that the inner summation in (6.45) can be limited to a small number 
of terms without significant loss of accuracy in computations while speeding 
up the computations considerably. 

If we limit fc to a maximum of K terms, let 

xArI<A[x/i\ 

/^) (®) = - ^2 乞 k)f s K (a; — ik) 

x 2=1 fc=i 


(6.47) 




1 0.50 1 0 0 1 0 

2 0.54 0 0 0 0 0 

3 1.03 0 0 0 0 0 

4 1.22 0 0 0 0 0 

5 1.23 0 0 0 0 0 

6 1.28 0 0 0 0 0 

7 1.42 0 0 1 0 0 

8 1.49 0 1 0 0 0 

9 3.53 0 0 0 0 1 

10 3.94 0 0 0 0 0 

11 4.79 0 0 0 0 0 

12 4.84 0 0 0 0 0 

13 21.82 0 0 0 0 0 


denote the approximation using at most K terms in (6.45). In a later paper, 
De Pril [27] shows that, if 办 < j = 1，2,… ，m, then 




6 ^ = KTi^ nij : 


and M = ^ n ij the maximum possible aggregate claim amount. 

The value of S(K) is easily calculated for any value of K. Equation (6.48) 








7.0035018X10 -3 
2.2383351 xl(T 2 
2.2752309xl(T 2 
8.5042522x10— 3 
6.3765090X1CT 2 
1.0265543 xlO -2 
2.5632810 xlO -2 
1.1672495x10— 1 
1.0284521 xltT 1 
3.4201726xl(T 2 
3.0931860xl0 _2 
3.8176959xl0_ 2 
2.6471800X10 -1 
1.3384040 


-3.5035025 xlO_ b 
-3.3400962xl0 _s 
-3.2354221X10 -5 


-5.5463884xl0_ 6 
-3.2852048 xlO -5 
-5.6769638xl0 -4 
-4.0681297X10 -4 
-4.1777074x10 一 5 
-3.1892665 xlO -5 
-4.7015487X10 -5 
-1.2741022x ' ° 
-2.9855420 x 


1.7526276x10— 9 
4.9841694x10— 8 
4.6008325 xlO -8 
2.1281907x10— 9 
8.0020991xl0 _7 
2.9966680 xlO -9 
4.2104515X10 -8 
2.7610139 xlO -6 
1.6091833xlO -6 
5.1030287X10— 8 
3.2883315xlO -8 
5.7900266X10 -8 


— 厶丄 o 入丄 u 

-7.4374944xlO -11 :' 
-6.5424723 xlO -11 
-1.0646277xl0~ 12 
-2.8347477X10 -9 
-1.6190750xl0- 12 
-5.3962851 xlO -11 
-1.3428300xl0~ 8 
-6.3652612 xlO -9 
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14 4.7660783X10 -4 

15 1.4216995xl0— 3 

16 1.3548133xl0- a 

17 4.7660783 xlO~ 4 

18 3.3750829 xlO -3 

19 5.1475706xl0- 4 

20 1.2210690 xlO -3 

24 4.6336840 xlO -3 

26 3.7686403x10- 3 

28 1.1637614xl0- 3 

29 7.1120536xl0- 7 

30 9.8301077xl0- 4 

31 1.1755723xl0— 3 

32 2.3995910 xlO" 6 

33 5.9716305 xlO -6 

34 6.1784053x10— 6 

35 4.2424878xl0— 6 

36 1.9938909 xlO -6 

37 2.4343694xl0- 6 


Fsjx) 

0.95273905 

0.95321566 

0.95463736 

0.95599217 

0.95646878 

0.95984386 

0.96035862 

0.96157969 

0.96621337 

0.96998201 

0.97114577 

0.97114649 

0.97212950 

0.97330507 

0.97330747 

0.97331344 

0.97331962 

0.97332386 

0.97332585 

0.97332829 


babilities for Example 6.32 

T fc( 

"~38~6.6436439 xlO~ b 

39 7.5742253xl0— 6 

40 8.4744508xl0— 6 

41 7.9416543 xlO" 6 

42 2.2356100x10— 5 

43 6.1253961x10^ 

44 2.1435448 xl(T 5 

45 4.6721584x10— 6 

46 1.2100769x10 一 5 

47 2.7915140x10 一 6 

48 5.5621896x10— 5 

49 4.6964950x10 一 6 

50 2.0226469x10— 5 

51 1.5103585xl0- 6 

5.6661624 " 


54 9.3923168x10*^ 

55 4.5913084x10— 2 

56 3.9003832 xlO~ € 

57 4.6823253x10— € 


Fs(x) 

0.97333493 

0.97334251 

0.97335098 

0.97335892 

0.97338128 

0.97338740 

0.97340884 

0.97341351 

0.97342561 

0.97342840 

0.97343397 

0.97343866 

0.97345889 

0.97346040 


.97346608 

0.97347547 

0.97806678 

0.97807068 

0.97807536 


6.11.3 Compound Poisson approximation 

Because of the computational complexitj， ’一 


警 画讎釅 酾 


By taking logarithms and using a Taylor series expansion of k 
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Table 6.23 Aggregate distribution for Example 6.33 


X Fs(x) [ 

X 

Fs{x) 

X 

Fs{x) 

X 

Fs{x) 

0 

0.00789581 

200 

0.51793382 

400 

0.96031865 



8 

0.00789581 

208 

0.55208736 

408 

0.96528977 

608 

0.99939248 

16 

0.01059183 

216 

0.57594360 

416 

0.96999808 

616 

0.99950270 

24 

0.01906263 

224 

0.60583632 

424 

0.97360199 

624 

0.99958630 

32 

0.02561396 

232 

0.63412023 

432 

0.97694855 

632 

0.99965590 

40 

0.02979597 

240 

0.66687534 

440 

0.98031827 

640 

0.99971784 

48 

0.03774849 

248 

0.68632432 

448 

0.98297484 

648 

0.99976679 

56 

0.04770335 

256 

0.71292641 

456 

0.98515668 

656 

0.99980876 

64 

0.06976756 

264 

0.73984823 

464 

0.98728185 

664 

0.99984205 

72 

0.07610528 

272 

0.76061061 

472 

0.98914069 

672 

0.99987063 

80 

0.10013432 

280 

0.78007667 

480 

0.99068429 

680 

0.99989481 

88 

0.12399672 

288 

0.80016248 

488 

0.99194854 

688 

0.99991371 ： 

96 

0.13839960 

296 

0.82073766 

496 

0.99321771 

696 

0.99992950v 

104 

0.15945674 

304 

0.83672920 

504 

0.99421982 

704 

0.99994275 

112 

0.18004988 

312 

0.85121374 

512 

0.99504627 

712 

0.99995364 

120 

0.22162539 ; 

320 

0.86849112 

520 

0.99580992 

720 

0.99996222 

128 

0.23569706 

328 

0.88253629 

528 

0.99644624 

728 

0.99996932 

136 

0,26335063 

336 

0.89343529 

536 

0.99701494 

736 

0.99997545 

144 

0.30172723 

344 

0.90530546 

544 

0.99745896 

744 

0.99998004 

152 

0.33041683 

352 

0.91610001 

552 

0.99785787 

752 

0.99998383 

160 

0.35497038 

360 

0.92599796 

560 

0.99821941 

760 

0.99998707 

168 

0.38635038 

368 

0.93340851 

568 

0.99850204 

768 

0.99998955 

176 

0.42177800 

376 

0.94180939 

576 

0.99873887 

776 

0.99999162 

184 

0.45476409 

384 

0.94903173 

584 

0.99895067 

784 

0.99999326 

192 

0.47881084 

392 

0.95482932 

592 

0.99912971 

792 

0.99999461 


Retaining only the first term in the inner sum yields the approximation 

n 

inPs(z) = ^ 办（ 2 〜一 1 ) 

i=i 

= X^2^(z b ^-1), (6.50) 

i=i 

where Xj = qj and A = 久 This results in 

Ps(z) = exp{A[Px(^) - 1]}, 

which is the pgf of a compound Poisson distribution with individual loss dis| 
tribution pgf 


(6 ： 5l| 
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Table 6.24 Aggregate distribution for Example 6.34 


X 

Fs(x) 

X 

Fs(x) 

* Fs{x) 

X 

^s(x) 

0 

0.9530099 

20 

0.9618348 

40 

0.9735771 

60 

0.9990974 

1 

0.9530099 

21 

0.9618348 

41 

0.9735850 

61 

0.9990986 

2 

0.9530099 

22 

0.9618348 

42 

0.9736072 

62 

0.9990994 

3 

0.9530099 

23 

0.9618348 

43 

0.9736133 

63 

0.9990995 

4 

0.9530099 

24 

0.9664473 

44 

0.9736346 

64 

0.9990995 

5 

0.9530099 

25 

0.9664473 

45 

0.9736393 

65 

0.9990996 

6 

0.9530099 

26 

0.9702022 

46 

0.9736513 

66 

0.9990997 

7 

0.9530099 

27 

0.9702022 

47 

0.9736541 

67 

0.9990997 

8 

0.9530099 

28 

0.9713650 

48 

0.9736708 

68 

0.9990998 

9 

0.9530099 

29 

0.9713657 

49 

0.9736755 

69 

0.9991022 

10 

0.9530099 

30 

0.9723490 

50 

0.9736956 

70 

0.9991091 

11 

0.9530099 

31 

0.9735235 

51 

0.9736971 

71 

0.9991156 

12 

0.9530099 

32 

0.9735268 

52 

0.9737101 

72 

0.9991179 

13 

0.9530099 

33 

0.9735328 

53 

0.9737102 

73 

0.9991341 

14 

0.9534864 

34 

0.9735391 

54 

0.9737195 

74 

0.9991470 

15 

0.9549064 

35 

0.9735433 

55 

0.9782901 

75 

0.9991839 

16 

0.9562597 

36 

0.9735512 

56 

0.9782947 

76 

0.9992135 

17 

0.9567362 

37 

0.9735536 

57 

0.9782994 

77 

0.9992239 

18 

0.9601003 

38 

0.9735604 

58 

0.9783006 

78 

0.9992973 

19 

0.9606149 

39 

0.9735679 

59 

0.9783021 

79 

0.9993307 


This distribution has pf 

Pr(X = x) = ^ [ A i- (6.52) 

{j:bj=x} 

The numerator sums all probabilities associated with amount bj. 

Note that the means of the frequency distribution and the aggregate loss 
distribution match those of the exact distribution. 

Example 6.34 (Example 6.30 continued) Consider the group life case of Ex¬ 
ample 6.30. Derive a compound Poisson approximation. 

Using the compound Poisson approximation of this section with Poisson 
parameter A = ^ = 0.04813, the distribution function given in Table 6.24 

is obtained. 

When these values are compared to those of Example 6.31, it can be seen 
that the niaximiim error of 0.0002708 occurs at x = 0. □ 

Some closely related approximations have been used. One popular one is 
to let Xj in (6.50) be set to 

Xj = -ln(l - qj), j = 1,2,(6.53) 

This matches the no-loss probability 1 — qj with the no loss probability of the 
Poisson distribution, e ~ Xj . This effectively replaces each life in the group by 
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a Poisson distribution. This approximation is appropriate in the context of a 
group life insurance contract where a life is “replaced” upon death, leaving the 
Poisson intensity unchanged by the death. Naturally the expected niimber of 
losses is greater than An alternative choice was proposed by Kfmya 

[79]. It used Xj = qj /(l - Qj) in (6.50). It results in an expected number of 
losses that exceeds that using (6.53) (see Exercise 6.61). 

6.11.3.1 More than one possible loss amount It was noted in the beginning 
of this section that there may be more than one possible loss amount. Ag^in 
let Bj be the random variable that measures the amount of the loss given thafc 
there was a loss and let Xj = ij-Bj* Then 

Pxj{z) = [1 - Qj + QjPBj(z)]- 
The pgf corresponding to (6.35) is 

n 

■Ps ⑷ -办几 

i=i 

Although it is possible to extend the exact computational methods to this 
case, it is quite cumbersome. However, the compound Poisson approximation 
based on matching of moments (6.50) simply requires replacing 2 ^ by 
Then, the pgf of the severity distribution (6.51) becomes 

Px{z) = \Y^ X j P Bj{ Z )' 


奴⑷ （ 6 . 54) 
i =1 

which is a weighted average of the n individual severity densities. Extensions 
to continuous severity distributions also satisfy (6.54) for all values of x. ^ 

Example 6.35 (Example 6.29 continued) Develop compound Poisson ap¬ 
proximations using all three methods suggested here. Compute the mean and 
variance for each approximation and compare it to the exact value. 

Using the method that matches the mean, we have A = 50(0.01)+25(0.01) =| 
0.75. The severity distribution is 


/ x (50,000) 
/ x (75,000) 
/ x (100,000) 


50(0.01)(0.7) 
= 0J5 

25(0.01)(0.7) 
= 0J5 

50(0.01)(0.3) 
= 0J5 


/ x (150,000)= 


_ 25(0.01)(0.3) 
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The mean is AE(X) = 0.75(75,833.33) = 56,875, which matches the exact 
value, and the variance is AE(Z 2 ) = 0.75(6,729,166,667) = 5,046,875,000, 
which exceeds the exact value. 

For the method that preserves the probability of no losses, A = —75 ln(0.99) 
= 0.753775. For this method, the severity distribution turns out to be exactly 
the same as before (this is due to the fact that all individuals have the same 
value of qj). Thus the mean is 57,161 and the variance is 5,072,278,876, both 
of which exceed the previous approximate values. 

Using Kornya，s method, A = 75(0,01)/0.99 = 0.757576 and again the 
severity distribution is unchanged. The mean is 57,449 and the variance is 
5,097,853,535, which are the largest values of all. 口 

6.11.4 Exercises 

6.60 Derive (6.36) and (6.37). 

6.61 Demonstrate that the compound Poisson model given by \j = qj and 
(6.52) produces a model with the same mean as the exact distribution but 
with a larger variance. Then show that the one using Xj = - ln(l - qj) must 
produce a larger mean and even larger variance, and finally show that the one 
using Xj = qj/{l - qj) must produce the largest mean and variance of all 

6.62 Individual members of an insured group have independent claims. The 
claiTn distribution has the statistics given in Table 6.25. 

The premium fora group with future claims S is the mean of S plus 2 times 
the standard deviation of S. If the genders of the members of a group of m 
members are not known, the number of males is assumed to have a binomial 
distribution with parameters m and q = 0.4. Let A be the premium for a 
group of 100 for which the genders of the members are not known and let B 
is the premium for a group of 40 males and 60 females. Determine AjB. 

6.63 An insurance company assumes claim probabilities for persons covered 
by its group life insurance contracts as given in Table 6.26. 

A group of mutually independent lives has coverage of 1,000 per life. The 
company assumes that 20% of the lives are smokers. Based on this assumption， 
the premium is set equal to 110% of expected claims. If 30% of the lives are 
smokers, the probability that claims will exceed the premium is less than 


Table 6.25 Data for Exercise 6.62 



2 

4 


4 

10 
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Table 6.26 

Data for Exercise 6.63 


Probability 

Class 

of claim 

Smoker 

0.02 

Nonsmoker 

0.01 


0.20. Using the normal approximation, determine the minimum number of 
lives which must be in the group. 


6.64 Based on the individual risk model with independent claims, the cumu- 
lative distribution function of aggregate claims for a portfolio of life insurance 
policies is as in Table 6.27. 

One policy with face amount 100 and probability of claim 0.20 is increased 
in face amount to 200. Determine the probability that aggregate claims for 
the revised portfolio will not exceed 500. 



600 


0.96 


700 


1.00 




Table 6.28 Data for Exercise 6.65 
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the maximum value of x such that the variance of the compound Poisson 
approximation is less than 4,500. 

6.69 An insurance company sold one-yeax term life insurance on a group of 

2,300 independent lives as given in Table 6.32. f in n nnn on Parhlife 

The insurance company reinsures amounts m excess of 100,000 on each Me 
Thlrei^urer wishes to charge a prenuum that is sufficient to guarantee that 
it w m lose money 5% of the time on such groups. Obtain 
premium by each of the following ways. 

(a) Using a normal approximation to the aggregate cli 

(b) Using a lognormal approximation. 

(c) Using a gamma approximation. 

(d) Using the compound Poisson approximation w 


( e ) Carrying out the calculations exactly (using the n 
by De Pril or some other method). This requires 




Discrete-time 
ruin models 


7.1 INTRODUCTION 


The risk assumed with a portfolio of insurance contracts is difficult to assess, 
but it is nevertheless important to attempt to do so in order to ensure the 
Viability of an insurance operation. The distribution of total claims over a 




idealized in order to maintain mathematical simplicity. Consequently the 
output from the analysis should not be viewed as a representation of absolute 
reaUty, but rather as important additional information on the risk associated 
with the portfolio of business. Such information is useful for long-run financial 
planning and maintenance of the insurer’s solvency. 


Loss Models: From Data to Decisions, Second Edition. 

By Stuart A. Klugman, Harry H. Panjer. and Gordon E. Willmot 
ISBN 0-471-21577-5 Copyright © 2004 John Wiley &: Sons, Inc. 
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This chapter is organized into two parts. The first paxt (Section 7.2) in¬ 
troduces process models. The appropriate definitions are made ajd theterms 
of ruin theory defined. The second part (Section 7.3) analyzes discrete-time 
models. This can be done with the tools presented m the P^vious chapters 
An analysis of continuous-time models is covered m the next chapter. This 
requires an introduction to stochastic processes. Two processes axe analyzed, 
the compound Poisson process and Brownian motion. The compound Pois¬ 
son process has been the standard model for ruin analysis m actuarial science, 
while Brownian motion has found considerable use in modern financial theory 
and also can be used as an approximation to the compound Poisson process. 


7 2 PROCESS MODELS FOR INSURANCE 
7.2.1 Processes 

The major difference between this chapter and the earlier ones is that wenow 
want to view the evolution of the portfoHo over time. With that m mmd, 
we define two kinds o£ processes. We note that, while processes that involve 
random events axe usually called stochastic processes, we wjU not employ 
the modifier “stochastic” and instead trust that the context will make it clear 
which processes are random and which axe not. 

Definition 7.1 A continuous-time process is denoted {Xt,t > 0}. If 
there are random elements, it is sufficient to specify the joint distribution of 


bitrary t. Many processes have correlations between me 
different times. 

{5t；t > 0} be the total losses paid from time 0 to 
ollective risk model of Chapter 6 may be used to describe 

stribution of (5 tl ,...,S t J, suppose h< ■•■ < ^et 
with S to = So = 0. Let the Wj have mdependent distri^ 
he collective risk model. The individual loss distributions 
while the frequency distribution would have a mean that 





Definition 7,5 A discrete-time process is denoted by {X t \ t = 0,1, 2, . • 

If there are random elements，it is sufficient to specify the joint distribution 
of (X tl ， … ， X t J for integer U and any n. 

A discrete-time process can be derived from a continuous-time process by 
just writing down the values of Xt at integral times. In this chapter, all 
discrete-time processes will take measurements at the end of each observation 
period, such as a month, quarter, or year. 












> 0} is tlie premium process which measures all premiums (net 
collected up to time t, and {St\ t > 0} is the loss process, which 
losses paid up to time t. We make the following observations: 

^ be written or earned premiums, as appropriate; 
r be paid or incurred losses, again, as appropriate; and 

Y depend on S u for u < t. For example, dividends based on 
)le past loss experience may reduce the current premium. 

ble, though not necessary, to separate the frequency and severity 
of St. Let {Nt；t > 0} be the claims process which records the 
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7.2.2.1 A discrete-time model Let the increment in the surplus process in 
year t be defined as 

W t = P t - Pt-i - + S t -u t = . 

Then the progression of surplus is 

H+Wt，t = l ，2 , •… 

It will be relatively easy to learn about the distribution of {JJ t \ t = 1,2,...} 
provided that the random variable W t either is independent of the other W t s 
or only depends on the value of U t 一 The dependency of W t on C/t-i allows 
ns to Dav a dividend based on the surplus at the end of the previous year 





















the appropriateness of the model. We will find that, if the Poisson process 
holds, infinite-horizon probabilities are easier to obtain. For other cases, the 
finite-horizon calculation may be easier. 

Although we have not defined notation to express them, there is another 
pair of limits. As the frequency with which surplus is checked increases (that 
is, the number of times per year), the discrete-time survival probabilities con¬ 
verge to their continuous-time counterparts. 

As this subsection refers to ruin, we close by defining the probability of 


Definition 7.10 The continuous-time ， infinite-horizon ruin probabil¬ 
ity is given by 

if{u) = l 一 <f>(u) 

and the other three ruin probabilities are defined and denoted in a similar 


manner. 
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7.3 DISCRETE, FINITE-TIME RUIN PROBABILITIES 


.3.1 The discrete-time process 


Let Pt be the premium collected in the tth. period and let St be the losses 
paid in the tth period. We also add one generalization. Let Ct be any cash 




TVip final fisfinmntinn is that, mven Un the random variable 


as follows. First, define 


J 0 ， UU < 0, 

\ Wt ， U t U > 0, 
UU + 胃， 


where the new process starts with Uq = u. In 
survival probability is 

^(u,r) = Pr(!7 ； >0) 


In this case，the finite-horizon 


The reason we need only check f/ t * at time r is that, once ruined, this process 
is not allowed to become nonnegative. The following example illustrates this 
distinction and is a preview of the method presented in detail in the next 
subsection. 


Example 7.11 Consider a process with an initial surplus of 2, a fixed annual 
premium of 3, and losses of either 0 or 6 with probabilities 0.6 and 0.4, 
respectively. There are no other cash flows. Determine 多 (2,2). 


There are only two possible values for J7x, 5 and 一 1， with probabilities 0.6 
and 0.4. In each year, Wt takes the values 3 and —3 with probabilities 0.6 and 
0.4. For year 2, there are four possible ways for the process to end. They are 
listed in Table 7.1. Then, 备 (2,2) = 0.36 + 0.24 = 0.60. Note that for U 2 the 
process would continue for cases 3 and 4, producing values of 2 and —4. But 
our process is not allowed to recover from ruin and so case 3 must be forced 


to remain negative. 
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Table 7.1 Calculations for Example 7.11 


Case 

ih 

W 2 

Wi 

ui 

Probability 

1 

5 

3 

3 

8 

0.36 

2 

5 

一 3 

一 3 

2 

0.24 

3 

—1 

3 

0 

-1 

0.24 

4 

一 1 

一 3 

0 

一 1 

0.16 


7.3.2 Evaluating the probability of ruin 

There are three ways to evaluate the ruin probability. One way that is al¬ 
ways available is simulation. Just as the aggregate loss distribution can be 
simulated, the progress of surplus can also be simulated. For extremely com¬ 
plicated models (for example, one encompassing medical benefits, including 
hospitalization, prescription drugs, and outpatient visits as well as random 
inflation, interest rates, and utilization rates), this may be the only way to 
proceed. For more modest settings the other two me'*' 1 
first is a brute force method that has few restricts 
inversion method that has some restrictions. 

7.3.2.1 Evaluation by convolutions For any practical use of this method, the 
distributions of all the random variables involved should be discrete and have 
finite support, li they are not, some discrete approximation should be con¬ 
structed. The calculation is done recursively, using (7.1). For notational pur¬ 
poses, suppose we have obtained the discrete pf of Uh Then the ruin proba- 

ation of n 
> 0 for 
that for 

=Pr(W t 
dues W t : 
ution. Fi 



Then, 
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Pr(J7 ； =a;) = Pr ([/ > 0 and U^ x + W t =x) 

=> 0 and U：_ x +W t = x\UU = Uj ) 
i=i 

x — Uj) 

=PT(uj+W t = x\UU = Uj)^ 

j=l 

Although these formulas look a bit intimidating, they axe fairly easy to 
implement. Consider the following example. 


Example 7.12 Suppose that annual losses can assume the values 0 } 2, 4, and 
6 , with probabilities 0.4, 0.3, 0.2, and 0.1, respectively. Farther suppose that 
the initial surplus is 2, and a premium of 2.5 is collected at the beginning of 
each year. Interest is earned at 10% on any surplus available at the beginning 


of th 
rebat 










is based on a premium of 2.5, interest of 0.45 (on the 
ion of the premium), a loss payment of 0, and a rebate of 
〔2 , 1), observe that the only value of which is below 
md so *0(2,1) = 0.1. It is also easy to see that the only 
have positive probability are those that have 2+w^k > 0. 
2 S for the distribution of t/f as in Table 7.3. 


Table 7.2 w and g for Example 7.12 


k 

Wl t k 

9l,k 

1 

2.45 

0.4 

2 

0.95 

0.3 

3 

-1.05 

0.2 

4 

-3.05 

0.1 











remaming proDaDinty is at rr^u x : 
way to visualize year 2 is with a 
tations of u/and Wj,k- The entries ii 
as need be presented because they ； 
joint probability for any cell is the 
>bability in that cell. The addition 
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Table 7.6 Probabilities for Example 7.13 


3 

U 3 

/i 

1 

0 

0.0134 

2 

2 

0.189225 

3 

4 

0.258975 

4 

6 

0.2568 

5 

8 

0.0916 


The probabilities total 0.81, the complement of 及 (2,2). By the earlier 
definition, the remaining 0.19 probability is associated with < 0. □ 

It should be easy to see that the number of possible u values as well as 
the number of decimal places can increase rapidly. At some point, rounding 
would seem to be a good idea. A simple way to do this is to demand that at 
each period the only allowable u values are some multiple of h, a span that 
may need to increase from period to period. When probability is assigned to 
some value that is not a multiple of h, it is distributed to the two nearest 
values in a way that will preserve the mean (spreading to more values could 
preserve higher moments). 

Example 7.13 (Example 7.12 continued) Distribute the probabilities for the 
surplus at the end of year 2 using a span ofh = 2. 

The probability of 0.04 at 1.645 must be distributed to the points 0 and 
2. To preserve the mean, 0.355(0.04)/2 = 0.0071 is placed at zero and the 
remaining 0.0329 is placed at 2. The expected value is 0.0071 ⑼ +0.0329(2)= 
0.0658, which matches the original value of 0.04(1.645). The value 0.355 is 
the distance from the point in question (1.645) to the next span point (2), and 
the denominator is the span. The probability is then placed at the previous 
span point. The resulting approximate distribution is given in Table 7.6. □ 


73.2.2 Evaluation by inversion One of the strengths of the inversion method 
is that the act of computing a convolution is reduced to a few multiplications. 
This is true, provided that the random variables are independent. In this case 
that means that W t is independent of U t -i. We use a different approach with 
regard to keeping track of ruin (earlier that was accomplished by freezing ?7 t * 
nnnn rmnV This idea could also be applied to the direct convolution approach. 
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2 . Determine cp 2 t (z) =E(e lzWt ), the characteristic function of W t . 

3. Then ^ 3 ^( 2 ) = ^ the characteristic function of C/^+Wi. 

4. Use inversion to determine ft(u), the pf of + W*. 

5. Let r t = Pr(J7 t **i +W t < 0). This is the probability that, given survival 
to time t 一 1 , the portfolio is ruined at time t, 

6 . Then = ft(u)/(l — rt) forn > 0 is the pf of 

7. The probability of ruin by time t is then ^(u^t) = t — 1) + [1 — 

必 (w，t 一 1)]. 

The process is initiated by noting that the pf of U\ can be obtained directly 
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Table 7.9 pf of Ui* for Example 7.14 


u 

Pv(ur=u) 

0.5 

2/9 

2.5 

3/9 

4.5 

4/9 


In order for the FFT to work in a simple maimer, it is best to have all 
amounts be positive. This can be accomplished by adding 3.5 to each variable. 
The shifted distributions axe given in the second and third columns of Table 
7.10. Anticipating that the shifted U^* + W 2 will take on values from 0 to 14 
with a span of 2, we observe that eight values are required. This is already 






Continuous-time 
ruin models 


8.1 INTRODUCTION 

In this chapter we turn to models that examine surplus continuously over time. 
Because these models tend to be difficult to analyze, we begin by restricting 
attention to models in which the number of claims has a Poisson distribution. 
In the discrete-time case we found that answers could be obtained by brute 
force. For the continuous case we find that exact, analytic solutions can be 
obtained for some situations, and that approximations and an upper bound 
nan obtainfirl for manv situations. In this section we introduce the Poisson 
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2. The process has stationary and independent increments. 


3. The number of claims in an interval of length t is Poisson distributed 
with mean Xt. That is } for all s,t > 0 we have 


Pi{N t+s -N s = n) 


(At) n e- 

n! 


n = 0 , 1 ， . 


( 8 . 1 ) 


Stationary increments means that the distribution of the number of 
claims in a fixed interval depends only on the length of the interval and not 
on when the interval occurs so, for example, there is no trend effect. In¬ 
dependent increments means that the number of claims in an interval is 
statistically independent of the number of claims in any previous interval (not 
overlapping the present interval). Together, stationary and independent 
increments imply that the process can be thought of intuitively as starting 
over at any point in time. Actually, the assumption of stationarity in Condi¬ 
tion 2 in the definition is not necessary because it is implied by Condition 3, 
but it is stated for clarity. 

An important property of the Poisson process is that the times between 
claims are independent and identically exponentially distributed, each with 
mean 1/A. To see this, let Wj be the time between the (j — l)th and jth 
claims for j = 1,2,.. where Wi is the time of the first claim. Then, 

Pr(Wi >t) = Pi{N t = 0) = 
and so W\ is exponential with mean 1 / 入 . Also, 

Pr(PF 2 > t\Wi = 5 ) = Pr(Wx + > s + t\W x = s) 

_ Pr(-/V t+S = l|iV s = 1) 

=Pr(JV t+s -JV s = 0|iV s = l) 

=Pr(iV i+s -iV s =0) 

because the increments are independent. From Condition 3, we then have 


Pr(W 2 > t|Wi = s) = e 一气 

Because this is true for all s ， Pi(W 2 > t) = e~ xt and W 2 is independent of 
W\. Similarly, W 3 , W 4 , W’ 5 , … are independent and exponentially distributed, 
each with mean 1/A. 

Finally, we remark that, from a fixed point in time to >0 the time until the 
next claim occurs is also exponentially distributed with mean 1/A, due to the 
memoryless property of the exponential distribution. That is, if the nth claim 
occurred s time units before to, the probability that the next claim occurs at 
least t time units after t。is Pr(W n +i > t + s\W n+ i > 5) = e 一 Af ， which is the 
same exponential survival function no matter what s and n happen to be. __ 


8.1.2 The continuous-time problem 

The model for claims payments will be the compound Poisson process. A 
formal definition follows. 

Definition 8.2 Let the number of claims process {N t ; t > 0} be a Poisson 
process with rate A. Let the individual losses {Xi } X 2 ,...} be independent 
and identically distributed positive random variables, independent of N t) each 
with cumulative distribution function F(x) and mean fi < 00 . Thus Xj is the 
amount of the jth loss. Let St be the total loss in (0,t]* It is given by St = Q 
if N t = 0 and S t = Ylfh ^ 3 订 Nt > 0. Then，for fixed t，St has a compound 
Poisson distribution. The process {St] t > 0} is said to be a compound 
Poisson process. Because {Nt ； t > 0} has stationary and independent 
increments, so does {St]t > 0}. Also, 

E(S t ) = EiN^EiXj) = = Xfit 

We assume that premiums are payable continuously at constant rate c per 
unit time. That is, the total net premium in (0, t] is ct and we ignore interest 
for mathematical simplicity. We further assume that net premiums have a 
positive loading, that is, ct >'E(St) J which implies that c > Xfx. Thus let 

c=(l + 0)A^ (8.2) 

where 0 > 0 is called the relative security loading or premium loading 
factor. 

For our model, we have now specified the loss and premium processes. The 
surplus process is thus 

U t =u-\-ct — S t , t > 0, 

where u = Uo is the initial surplus. We say that ruin occurs if U t ever be¬ 
comes negative, and survival occurs otherwise. Thus, the infinite-time survival 
probability is defined as 

<j)[u) = Pr(i7t > 0 for all t > 0\Uq = u), 
and the infinite-time ruin probability is 
7p(u) m l — 

Our goal is to analyze (j>{u) and/or ^(u). 

8.2 THE ADJUSTMENT COEFFICIENT AND LUNDBERGS 
INEQUALITY 

In this section we determine a special quantity and then show that it can be 
used to obtain a bound on the value of ♦{u、. While it is only a bound, it is 
easy to obtain, and as an upper bound it provides a conservative estimate. 
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8.2.1 The adjustment coefficient 

It is difficult to motivate the definition of the adjustment coefficient from a 
physical standpoint, so we just state it. We adopt the notational convention 
that X is an arbitrary claim size random variable in what follows. 

Definition 8.3 Let t = k be the smallest positive solution to the equation 

1 + (1 + 9)nt = ( 8 - 3 ) 

w here M x (t) = E{e tX ) is the moment generating function of the claim sever¬ 
ity random variable X • If such a value exists, it is called the adjustment 
coefficient. 


To see that there may be a solution, consider the two lines in the (t, y) 
plane given by yi{t) = 1 + (1 + 0)(J,t and y 2 (t) = M x {t) = E(e tX ). Now, 
yi (t) is a straight line with positive slope (1 + 6)^. The mgf may not exist 
at aU or may exist only for some values of t. Assume for this discussion 
that the mgf exists for all nonnegative t. Then y' 2 {t) = E(Xe l ) > 0 and 
y'Ut) = E(X 2 e tX ) > 0. Because 讥⑼ =y 2 ⑼ = 1， the two curves intersect 
when t = 0. But ^(0) = E{X) = M < (1 + = 乂⑼ - Thus, as t increases 

from 0 the curve y 2 {t) initially falls below 讥 ⑴， but because 处⑴ > 0 and 
- y »(t) > 0, eventually y 2 ⑺ will cross t/i(i) at a point k>0. The pomt k is 
the adjustment coefficient. 

We remark that there may not be a positive solution to (8.3), for example, 
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Fig. 8.1 Left and right sides of the adjustment coefficient equation. 
For the gamma distribution = 2/3 and 
M x (t)= 

Then from (8.3) we obtain 

1 + 6 / c/3 = (1 - f3tc)~ 2 , 
which may be rearranged as 

6/3 3 k s - 11/3V + 物 = 0. 

This is easily factored as 

k/3(2kP - l)(3«/3 - 4) = 0. 


e tx f{x)dx = (l-Pty 


adiustment coefficient 
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k. To find sucli a value, note that for (8.3) we may write 

l + (l + 0)/x/c = E(e KX ) 

=E(1 + kX 4 - + .. •) 

> E(l + KX-h^K 2 X 2 ) 


Then subtraction of 1 + from both sides of the inequality and division by 
k results in 



The right-hand side of (8.5) is usually a satisfactory initial value of k. Other 
inequalities for k axe given in the exercises. 


Example 8.6 The aggregate loss random variable has variance equal to three 
times the mean. Determine a bound on the adjustment coefficient. 


For the compound Poisson distribution, E(St) = A〆，Var(5t) = XtE(X 2 ), 
and so E(X 2 ) = 3/x. Hence, from (8.5), k < 20/3. □ 


' Define 

H(t) = 1 + (1 + e)fxt~ M x {t). ( 8 . 6 ) 

Then the adjustment coefficient « > 0 satisfies H(k) == 0. To solve this 
equation, use the Newton-Raphson formula, 


«i+i = - 


H{k 5 ) 


where 

H'{t) = {l + 9) l i-M' x {t) 

beginning with an initial value kq. Because fZ"(0) = 0, care must be taken so 
as not to converge to the value 0 . 


Example 8.7 Suppose the Poisson parameter is \ = 4 ： and the premium rate 
is c = 7. Farther suppose the individual loss amount distribution is given by 


Pi(X = 1) = 0.6, Pr(X = 2) = 0.4. 


Determine the adjustment coefficient 


We have 
and 


/x = E(X) = (1)(0.6) + (2)(0.4) = 1.4 
E(X 2 ) = (1) 2 (0.6) + (2) 2 (0.4) = 2.2. 
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Then 6 = c(Ayii )-* 1 一 1 = 7(5.6) 一 1 - 1 = 0.25. Prom (8,5)，we know that k 
must be less than, kq = 2(0.25) (1.4)/2.2 = 0.3182. Now, 

M x (t) = 0.6e 4 + 0.4e 2t 

and so from ( 8 . 6 ) 

H(t) = 1 + 1.75t - 0.6e* — 0.4e 2t . 

We also have 

M' x (t) = (16*)(0.6) + (2e 2 t )(0.4) 

and so 

H\t) = 1.75 — 0.6e* - 0.8e 2t . 

Our initial guess is ko = 0.3182. Then H(kq) = —0.02381 and H f (Ko) = 
一 0.5865. Thus, an updated estimate of n is 


耐 = 0 . 3182 - ^^ =0 , 2776 . 


Then ff(0.2776) = 一 0.003091， 丑’ (0.2776) = — 0.4358, and 


Continuing in this fashion, we get /C 3 = 0.2703, ^4 = 0.2703, and so the 
adjustment coefficient k = 0.2703 to four decimal places of accuracy. 口 


There is another form for the adjustment coefficient equation (8.3) which. 



e KX f e (x)dx, 


(8.7) 


where 

f e (x) = 1 — - 幽 ， X > 0 ， ( 8 . 8 ) 

is the equilibrium probability density function discussed in Section 4.3.3. 

To see that (8.7) is equivalent to (8.3), note that 



e Kx f e {x)dx 



from Exercise 4.15. Thus replacement of Mx{^) by 1 + (1 + 9)fiK in this 
expression yields (8.7). 
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0< lim i)(u) < lim e 一 = 0 ， 

u—*oo u—*oo , 

which establishes (8.10). We then have that for the survival probability 

0(oo) = 1. (8.12) 

8.2.3 Exercises 

8.1 Calculate the adjustment coefficient if 0 = 0.32 and tlie claim size distri¬ 
bution is the same as that of Example 8.5. 

8.2 Calculate the adjustment coefficient if the individual loss size density is 

f(x) = a: > 0. 

8.3 Calculate the adjustment coefficient if c = 3, 入 = 4, and the individual 
loss size density is f(x) = e~~ 2x + f e _3x } a: > 0. Do not use an iterative 
numerical procedure. 

8.4 If c = 2.99, A = 1, and the individual loss size distribution is given by 
Pr(X = 1) = 0.2, Pr(X = 2) = 0.3, and Pr(X = 3) = 0.5, use the Newton- 
.Raphson procedure to numerically obtain the adjustment coefficient. 

8.5 Repeat Exercise 8.3 using the Newton—Raphson procedure beginning with 
an initial estimate based on (8.5). 

8.6 Suppose that E(X 3 ) is known where X is a generic individual loss amount 
random variable. Prove that the adjustment coefficient ac satisfies 

—3E(X 2 ) + 79[E(X 2 )P + 240mE(X3) 


Also prove that the right-hand side of this inequality is strictly less than the 
bound given in (8.5), namely 26fi/E(X 2 ). 

8.7 Recall that, if g n (x) > 0, Jensen’s inequality implies E[^(y)] > ^[E(y)]. 
Also, from Section 4.3.3, 


r 春今， 


where f e (x) is defined by (8.8). 

(a) Use (8.7) and the above results to show that 
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8.10 Continue the previous exercise. Use integration by parts to show that 


e Ky dF(y) = e KX S(x) 


e Ky S(y) dy, x>0. 


8.11 Suppose F(x) has a decreasing hazard rate (Section 4.3.3). Prove that 
S(y) > S(x)S(y — x)^ x > 0 ) y > x. Then use Exercise 8.10 to show that 
(8.13) is satisfied with p -1 =E(e wX ). Use (8.3) to conclude that 

矽 ( x ) <[1 + (1 + 一 1 e 一 KX ， x >0. 

8.12 Suppose F(x) has a hazard rate fi(x) = -(d/dx)hiS(x) which satisfies 
< m < cx), a: > 0. Use the result in Exercise 8.10 to show that (8.13) is 

satisfied with p = l — k/tti and thus 

矽 (x) < (1 - K/m)e~ KX , x>0. 

Hint: Show that, for y > x, S(y) > 5(rc)e 一 ( 左一 ®) 771 . 


8.3 AN INTEGRODIFFERENTIAL EQUATION 

We now consider the problem of finding an explicit formula for the ruin prob¬ 
ability ip{u) or (equivalently) the survival probability It will be useful 
in what follows to consider a slightly more general function. 

Definition 8.9 G{u^y) = Vv{ruin occurs with initial reserve u and deficit 
immediately after ruin occurs is at most y), u>0^y>0. 

For the event described, the surplus immediately after ruin is between 0 
and —y. We then have 

•{u、= G(u,y), u>0. (8.14) 

We have the following result. 

Theorem 8.10 The function G(u, y) satisfies the equation 

y) = 会 G ( 以 ,y) 一会乂 G(u - x, y) dF{x) - ^[F(u + y) - ^(w)], 

u>Q. (8.15) 

Proof: Let us again consider what happens with the first claim. The time of 
the first claim has the exponential probability density function Xe~ xt , and the 
surplus available to pay the first claim at time tisu + ct. If the amount of the 
claim is x, where 0 < x < u -i- ct. then the first claim does not cause ruin but 
reduces the surplus to u+ct—x. By the stationary and independent increments 
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property, ruin with a deficit of at most y would then occur thereafter with 
probability G(u -\-ct-x, y). The only other possibility for ruin to occur with 
a deficit of at most y is that the first claim does cause ruin, that is, it occurs 
for an amount x where x > u-\-ct but x < u^-ct-\-y because, if x > w+ct+y, 
the deficit would then exceed y. The probability that the claim amount x 
satisfies u + ct<x<u + ct-{~y is F(u + ct + y) - F(u + ct). Consequently, 
by the law of total probability, we have 

poo r pu+ct 

G(u,y) = j G(u + ct-x,y)dF(x) 

+F(u + ct + y)-F{u + rf) j Xe~ xt dt. 

We wish to differentiate this expression with respect to u, and in order to do 
so, it is convenient to change the variable of integration from t to z = u -j- ct. 
Thus, t = (z — u)/c and dt = dz/c. Then, with this change of variable we have 

G(u,y) = ~e (A/c)u J e _(A/c)z G(z - x, y)dF{x) + F(z + 2/) - F(z)^ dz. 

Recall from the fundamental theorem of calculus that, if is a function, then 
k(z)dz = —fe(u), and we may differentiate with the help of the product 
rule to obtain 

4~G{u,y) = -G(u, y) + C G{u - x, y)dF[x\ 


+F(w + y) - F(u) 


from which the result follows. 


We now determine an explicit formula for G(0 ， y). 
Theorem 8.11 The function G(0,y) is given by 

G(0,y) = - f V [l-F(x)]dx, y>0. 


Proof: First note that 


0 < G(u } y) <^{u) <e 


0 < G(oo ， y) = lim G(u,y) < lim e~ KU = C 

— v u—*oo U—*CX) 

and therefore G(oo, y) = 0. Also, 

f G(u, y)du< [ e~~ KU du = /c _1 < oo. 
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Thus let r(y) = G(u i y)du and we know that 0 < r(y) < oo. Then, 
integrate (8.15) with respect to u from 0 to oo to get, using the above, 

-G(0, 2 /) = J J G(u-x,y) dF(x) du ~~ J [F(u-\-y)-F(u)] du. 

Interchanging the order of integration in the double integral yields 

G(0,y) = -~r(y) + ^ f°° [°° G(u - x, y) du dF(x) 
c c JO Jx 

+- [ [F(u + y) - F(u)] du 


and changing the variable of integrat 
integral of the double integral results 


from u to v = u — x in the inner 


G(Q,y) = --r(y) + ^ f°° [°° G(v, y) dv dF(x) 
c c Jo JO 

+- [ + 2/) - ^(u)] du 


A , 、 A 

■—T(|/) + - 


r(y) dF{x)- / [F(u + y)- F(u)] du. 


Because / 0 °° dF(x) = 1, the first two terms on the right-hand side cancel, and 


[F(u + y) - F{u)} du 


[1 — F(u + y)] du 


Then change the variable from u to x = u in the first integral and from u to 
x = u^y in the second integral. The result is 


G(0,y) = - r[l-F(x)]dx-~ [°°[1-F(x)]dx = - Hi - F(i)] dx. 


We remark that (8.16) holds even if there is no adjustment coefficient. The 
function G(0,y) is itself of considerable interest in its own right, but for now 
we shall return to the analysis of 


Theorem 8.12 The survival probability with no initial reserve satisfies 




( 8 . 17 ) 
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Proof: Recall that fi = J 0 °°[l — F(x)] dx and note that from (8.16) 

■0 ⑼ =lim G(0,y) = 7 / [1 - •?(*)] dx = 今 = 

y—^oo C Jq C x -r u 

Thus, 參 (0) = 1 - 分 (0) = 0/(1 + 0). 口 

The general solution to (j){u) may be obtained from the following integro- 
differential equation subject to the initial condition (8.17). 

Theorem 8.13 The probability of ultimate survival (j)(u) satisfies 

^'(u) = -<j>{u) - - r 4>{u-x)dF{x), u>0. (8.18) 

C C Jq 

Proof: From (8.15) with y — oo and (8.14), 

{u) = —^(u) — — /* ^(u — x) dF{x) - [1 一 jP(tfc )]， w ^ 0. (8.19) 

c c Jq c 

In terms of the survival probability <p{u) = 1 — 矽 (w) ，（ 8.19) may be expressed 
as 

—q>{u) = — [1 - (j)(u)] — — / [1 — 6(u — x)] dF(x) 一令 [1 一 卩 ( 批 )] 
c c Jq c 

= S(u) 一 I j dF(x)-\-^ J <j>[u - x) dF{x) + ^F(u) 

=+ - f <l>(u - x) dF(x) 
c c Jo 

because F(u) = dF(x). The result then follows. 口 

It is largely a matter of taste whether one uses (8.18) or (8.19). We shall 
often use (8.18) because it is slightly simpler algebraically. Unfortunately, the 
solution for general F(x) is rather complicated and we shall defer this general 
solution to Section 8.4. At this point we shall obtain the solution for one 
special choice of F(x). 

Example 8.14 (The exponential distribution) Suppose, as in Example 8.4, 
that F(x) = 1 - a; > 0. Determine 

In this case (8.18) becomes 

(pf (u) = ― -(b{u) — — J (j>{u — x)e~ x ^ fl dx. 
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Change variables in the integral from x to y = u — x to obtain 

<b'(u) = - — e ~ u ^ f U <j>{y)e y ^dy. ( 8 . 20 ) 

c \IC Jo 

We wish to eliminate the integral term in (8.20) so we differentiate with respect 
to u. This gives 

^'(u) = + -^ e ~ U/tl £ <l>(y)e y/tt dy- ^{u). 

The integral term can be eliminated using (8.20) to produce 

= ^<j>'(u) - + I [^(«) - ^(u)] , 

C t 1 L c 」 

which simplifies to 

o) = ~^T 0 )^ ，(u) - 

After multiplication by the integrating factor this may be rewrit¬ 

ten as 

去 [>/W 1+0 V ⑼ ] =0. 

Integrating with respect to u gives 

e 0u/Wl + 0 V(tO = ifi. 

From (8.20) with w = 0 and using (8.17), we thus have 

T , 心、 A e A e e 

Kl = cH0) = - c TTe = \Ki + o)i + e = ^TW 

Thus, 

which may be integrated again to give 

^ ) = -TT0 exp [-MT^)] +if2 - 

Now (8.17) gives 沴 (0) = 6/(1 + 0), and so with u = 0 we have K 2 = 1- Thus 


is the required probability. 


n 
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8.3.1 Exercises 

8.13 Suppose that the claim size distribution is exponential with F{x )= 
1 — e -咖 as in Example 8.14. 

(a) Prove, using (8.15), that G(u, y) = ^{u)F(y) in this case. 

(b) Prove that the distribution of the deficit immediately after ruin 
occurs, given that ruin does occur, has the same exponential dis¬ 
tribution given above. 

8.14 This exercise involves the derivation of integral equations called defec¬ 
tive renewal equations for G(u, y) and ip{u). These may be used to derive 
various properties of these functions. 

(a) Integrate (8.15) over u from 0 to t and use (8.16) to show that 

G(t,y )= 会会乂 A(t-x,y)dF(x) 

+- f V [l-F(x)]dx-~ [\l - F(u)]du 

c Jo t c Jo 

+- / [1 - F(u + y)]du, 

c Jo 

where A(x, y) = Jq G(v, y)dv. 

(b) Use integration by parts on the integral f^A(t—x, y)dF(x) to show 
from (a) that 

G(t, y) = ~A(t, y) — ^ [ G(t — x, y)F(x)dx 

C C Jq 

\ fyH x r* 

+— / [1 — F{x)]dx - / [1 — F{u)\du. 

c Jo c Jo 

(c) Prove using (b) that 

\ ru \ ry+u 

G{u,y) = — / G(u - x, y)[l — F(x)]dx + - / [1 - F{x))dx. 

c Jq c Ju 

(d) Prove that 

ip(u) = ^ J 一 工 )[1 一 F{x)]dx + 会 / [1 一 F{x)]dx. 

8.4 THE MAXIMUM AGGREGATE LOSS 
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Beginning with an initial reserve u, the probability that the surplus will 
ever fall below the initial level u is 矽 (0) because the surplus process has 
stationary and independent increments. Thus the probability of dropping 
below the initial level u is the same for all u, but we know that when u = 0 
it is 畛 (0). 

The key result is that, given that there is a drop below the initial level u, 
the random variable Y whdch represents the amount of this initial drop has 
the equilibrium probability density function f e (y), where f e (y) is given by 
( 8 . 8 ). 

Theorem 8.15 Given that there is a drop below the initial level u, the random 
variable Y which represents the amount of this initial drop has probability 
density function f e (y) = [1 — F(y)]/fji. 

Proof: Recall the function G(u,y) from Definition 8.9. Because the surplus 
process has stationary and independent increments, G(0, y) also represents 
the probability that the surplus drops below its initial level, and the amount 
of this drop is at most y. Thus, using Theorem 8.11, the amount of the drop, 
given that there is a drop, has cumulative distribution function 


Vr(Y<y)= 


G(0,y) 


刪 

^miy F{u)]du 

i J V [1 - F{u)} du 


and the result follows by differentiation. 


□ 


If there is a drop of y, the surplus immediately after the drop is u — y^ and 
because the surplus process has stationary and independent increments, ruin 
occurs thereafter with probability ip(u — y), provided u — y is nonnegative; 
otherwise ruin would have already occurred. The probability of a second drop 
is 功 (0)，and the amount of the second drop also has density f e (y) and is 
independent of the first drop. Due to the memoryless property of the Poisson 
process, the process “starts over” after each drop. Therefore, the total number 
of drops K is geometrically distributed, that is, Pi(K = 0) = 1 — 咕 (0 )， 
Px(K = 1) = [1 一 必 (0)] 矽 (0)， and more generally, 

Pr(J( = &) = [1 — ^(0)][^(0)] fc = (Y^) ’ fc = 0’ 1’ 2 , …， 

because -0(0) = 1/(1 + 0). The usual geometric parameter P (in Appendix B) 
is thus 1/0 in this case. 

After a drop, the surplus immediately begins to increase again. Thus, the 
lowest level of the surplus isu — L, where L, called the maximum aggregate 


variables [each, with density / e (y)]. Because the number of drops is if, it 
follows that 

L = Y 1 + Y 2 + --^Y k 

with L = 0if if = 0. Thus, L is a compound geometric random variable with 
“claim size density 5 ’ / e (y). 

Clearly, ultimate survival beginning with initial reserve u occurs if the 
fpa.vimmn aggregate loss L does not exceed u, that is, 

^(u) = Pr(L < u), u>0. 

Let F^°(y) = 0 if y < 0 and 1 if y > 0. Also F* k (y) = Pr{Yi + K + • • * + < 

y } is the cumulative distribution function of the fc-fold convolution of the 
distribution of Y with itself. We then have the general solution, namely, 

咖卜 £士(士） o ’ d 

In terms of the ruin probability, this general solution may be expressed as 

♦) = Er^(ri?) 

where S* k (y) = l—F* k (y). Evidently, ^(u) is the survival function associated 
with the compound geometric random variable L, and analytic solutions may 
be obtained in a similar manner as to those obtained in Section 6.4. An 
analytic solution for the important Erlang mixture claim severity pdf 3 

f{x) = ^ qk (&— i ) 厂 — ， 

where the qk axe positive weights that sum to 1, may be found in Exercise 
8.17, and for some other claim severity distributions in the next section. 

We may also compute ruin probabilities numerically by computing the 
cumulative distribution function of a compound geometric distribution using 
any of the techniques described in Chapter 6. 

Example 8.16 Suppose the individual loss distribution is Pareto with a = 3 
and a mean of 500. Let the security loading be 6 = 0.2. Determine <j){u) for 
u = 100,200, 300,. … 


3 Any continuous positive probability density function may be approximated arbitrarily 
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Table 8.1 Survival probabilites, Pareto losses 


u 


u 


100 

0.193 

5,000 

0.687 

200 

0.216 

7,500 

0.787 

300 

0.238 

10,000 

0.852 

500 

0.276 

15,000 

0.923 

1,000 

0.355 

20,000 

0.958 

2,000 

0.473 

25,000 

0.975 

3,000 

0.561 




We first require the cdf, F e (u). It a 

m be found from its pdf 


: ( 1,000 y 
Vi^ooo+^y 


M 500 

1 f 1,000 Y 

500 v 1,000 ， 


which happens to be the density function of a Pareto distribution with a = 2 
and a mean of 1,000. This new Pareto distribution is the severity distribution 
for a compound geometric distribution where the parameter is /3 = 1/9 = 
5. The compound geometric distribution can be evaluated using any of the 
techniques in Chapter 6. We used the recursive formula with a discretization 
which preserves the mean and a span of ft = 5. The cumulative probabilities 
are then obtained by summing the discrete probabilities generated by the 
recursive formula. The values appear in Table 8.1. □ 

8.4.1 Exercises 

8.15 Suppose the number of claims follows the Poisson process and the amount 
of an individual claim is exponentially distributed with mean 100. The rela¬ 
tive security loading is 0 = 0.1. Determine ^>(1,000) by using the method of 
this section. Use the method of rounding with a span of 50 to discretize the 
exponential distribution. Compare your answer to the exact ruin probability 
(see Example 8.14). 

8.16 Consider the problem of Example 8.5 with (3 = 50. Use the method of 
this section (with discretization by rounding and a span of 1) to approximate 
^(200). Compare your answer to the exact ruin probability which may be 
found in Example 8.19. 
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8.17 Suppose that the claim severity pdf is given by 

/(d> (/c _ij!> i>0 > 

fc=i v J 

where = 1 - Note that tllis is a mixture of gamma densities. 


(a) Show that 




and also show that ^ 

(b) Define 

Q*(z) = jZQ^ k 

k=l 

and use the results of Exercise 6.35 to show that 

n=l j=0 


C{z) = |l- g[Q*{z) -i]| 

isa compound geometric pgf，with probabilities which maybe com¬ 
puted recursively by cq = 0(1 + 0) _1 and 


c k = TT~nYl - i ， = 


where 右 = 0 if j / 1 ， 2, . 

(c) Use (b) to show that 




where Cj = YZ^j+i c fc ， j = 0, 1, •... Then use (b) to show that 
the C n s may be computed recursively from 
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_ 1 _ I 00 

^ n = TlTe^ q ^ n ~ k + T+e ^ 71 = 1 ， 2 , …，. 

k=l fc=n+l 

beginning with C 0 = (1 + 0) 一 1 . 

8.18 (a) Using Exercise 8.14(c) prove that 

G{u,y) = J q G{u-x, y)f e (x)dx + J V f e (x)dx, 

where G(%y) is defined in Section 8.3, and using Exercise 8.14(d) 
that 

咕⑷ = TTe Jo ^ u ~ 4 九⑷也 + j fe(x)dx, 

where / e (x) is given by (8.8). 

(b) Prove the results in (a) directly by using probabilistic arguments. 
Hint: Condition on the amount of the first drop in surplus and use 
the law of total probability. 

8.5 CRAMER’S ASYMPTOTIC RUIN FORMULA AND TIJMS ， 
APPROXIMATION 

There is another very useful piece of information regarding the ruin probability 
which involves the adjustment coefficient /c. The following theorem gives a 
result known as Cramer } s asymptotic ruin formula. The notation a(x) ~ b(x), 
x oo y means Vmx^oo^x)/b(x) = 1. 

Theorem 8.17 Suppose /v > 0 satisfies (8.3). Then the ruin probability sat¬ 
isfies 

寸 (u) 〜 Ce 一 ⑽， w — oo ， (8.21) 

where 

c = (8 . 22) 

肌 d Mx(t) =E(e tx ) = e tx dF(x) is the moment generating function of 
the claim severity random variable X. 

Proof: The proof of this result is complicated and utilizes the key renewal 
theorem together with a defective renewal equation for ip(u) which may be 
found in Exercise 8.14(d), or equivalently in Exercise 8.18(a). The interested 
reader should see Rolski et. al. [113] ， Sec. 5.4.2 for details. □ 
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Lundberg，s inequality (8.9) to hold, it must be the case that C given by (8.22) 
must satisfy C <1. Also, although (8.21) is an asymptotic approximation, it 
is known to be quite accurate even for u which is not too large (particularly 
if the relative security loading 6 is itself not too large). Before continuing, let 
us consider an important special case. 

Example 8.18 (The exponential distribution) If F(x) = 1 - x > 0, 

determine the asymptotic ruin formula. 

We found in Example 8.4 that the adjustment coe 伍 dent was given by 
/c = 0 /[m( 1 + 0)] and M x {t) = (1 一 Thus, 

⑻- 1 =mi - ㈣ r 2 . 

Also, 

Thus, from (8.22), 

广 S _ 1 

c== fiii + ey^-Ki+e) = (i + 0)(i + 0-i) = TTe' 

The asymptotic formula (8.21) becomes 

咖）〜 u ^°°- 

This is the exact ruin probability as was demonstrated in Example 8.14. □ 

In cases other than when F{x) is the exponential distribution, the exact 
solution for ip(u) is more complicated (including in paxticulax the general 
compound geometric solution given in Section 8.4). A simple analytic ap¬ 
proximation was suggested by Tijms [129], pp. 271-272 to take advantage of 
the accuracy for large u of Cramer’s asymptotic ruin formula given in The¬ 
orem 8.17. The idea is to add an exponential term to (8.21) to improve the 
accuracy for small u as well. Thus, the Tijms approximation is defined as 

^)rp{u )= ( 工 一 G) e~ u ^ a 4 - Ce 一 2 0, (8.23) 

where a is chosen so that the approximation also matches the compound 
geometric mean of the maximum aggregate loss. As shown in Section 4.3.3, 
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B we have E(JiT) = 1/0. Because the maximum aggregate loss is the compound 
geometric random variable L, it follows from (6.6) that its mean is 

E(L) = E(K)E(Y) = ^1. 

But ^(u) = Pr(L > u), and from (3.9) on page 32 with k = l and u = oo we 
have E(L) = / 0 °° ^(u)du. Therefore, in order that the Tijms approximation 
match the mean, we need to replace ^(u) by ♦ T (u) in the integral. Thus from 
(8.23) 


/o°° 味 tWu = 


Cl/o 00 e-^du + C/o 00 e^ KU du 


and equating this to E(i) yields 


C 二 E(X 2 ) 

k 2fj,0 1 


which may be solved for a to give 


E(X 2 )/(2/i0) - C/t 


Tijms，approximation to the ruin probability is given by (8.23), 


o provicUiig a simple analytic approximation 
approximation ( 乜 ） has the added benefit 


V)\P /p ), x > 0, with 0 < p < 1 (of course, if p = 1, this is the e 

ponential density for which Tijms，approximation is not used). We have i 
following example. 

Example 8.19 (A gamma distribution with a shape parameter 4 of 2). 
in Example 8.5, suppose that 0 = 2, and the single claim size density 
f(x) = 0 一 2 xe- x ’ 卩 , re > 0. Determine the Tijms approximation to the ru 


m = r 

probability. 


The moment generating function is Mx{t) = (1 - /3t)- 2 ,t< l / 卢 ， fro 
wMch one finds that M f x (t) = 2/3(1- and ⑼ = 20. As shoT 


^or a gamma distribution, the shape ] 


is the one denoted by a in Appendix . 
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in Example 8.5, the adjustment coefficient « > 0 satisfies 1 + (1 + 0)Kji = 
Mx(/v), which in this example becomes 1 + 6(3k = (1 — pn) 一 2 and is given 
by /c = l/(2/3). We will first compute Cram6r s s asymptotic ruin formula. We 
have M^[l/(2/3)] = 2/3(1 — |)— 3 = 16/3. Thus, (8.22) yields 

⑽⑶ 2 
— 16/3 - (2/?)(1 + 2) ~ 5 J 

and from (8.21), 矽 (w) 〜 |e 一议八 2 卢 )， u 一 oo. We next turn to Tijms 5 approxi¬ 
mation given by (8.23), which becomes in this case 

碰 ={Th~l) e - u/a+ r u/m =r u/m - 士 — ， 

It remains to compute a. We have = 6/3 2 (l — 你 ) -4 , from which it 

follows that E(X 2 ) = M^(0) = 6/3 2 . The amount of the drop in surplus 
has mean E(F) ==E(X 2 )/(2/j,) = 6/3 2 /(4/3) = 3/3/2. Because the number of 
drops has mean E(K) = 1/0 = 辜 ， the maximum aggregate loss has mean 
E(L) = E(iiT)E(y) = 3/3/4， and a must satisfy E(L) = J^° ^ T (u)du^ or 
equivalently (8.24). That is, a is given by° 

f-1(2/3) zp 

a ~ 丄 _2 4 • 

1+2 5 ^ 

Tijms 5 approximation thus becomes 

ij T (u) = | e -W ㈣—㈣ ， ti>o. 

As mentioned above, ^(u) = 寸 t[ u ) m this case. □ 

Another class of claim severity distributions for which the Tijms approxi¬ 
mation is exactly equal to the true ruin probability is that, with probability 
density function of the form f(x) = + (1 — p)(P 2 e ~ f32X )^ x > 0. 

If 0 < p < 1, then this distribution is 汪 mixture of two exponentials, whereas 
if, p == P 2 /W 2 ~ 0i) then the distribution, referred to as a combination of 
two exponentials, is that of the sum of two independent exponential random 
variables with means and 冷 2 , where ^ /3 2 - The next example illustrates 
these ideas. 


5 It is actually not a coincidence that 1/a is the other root of the adjustment coefficient 
equation, as may be seen from Example 8.5. It is instructive to compute a in this manner, 
however, because this approach is applicable in general for arbitrary claim size distributions, 
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Example 8.20 (A mixture of exponential distributions) Suppose that 0 = 
and the single claim size density is f(x) = e 一 3a; +10e— 5a; /3 ， x>0. Determine 
the Tijms approximation to the ruin probability. 

First we note that the moment generating function is 

Mx(t) = J e tx f(x)dx = (3 — t) -1 -f ^(5 — t)"" 1 . 

Thus, M l x {t) = (3 — t) 一 2 + 等 (5 —〜一 2 , from which it follows that fi = = 

I + = 11. Equation (8.3) then implies that the adjustment coefficient k > 0 
satisfies 1 + = (3 — 托广 1 + 學 (5 — k) - 1 • Multiplication by 3(3 — «)(5 — k) 

yields 

3(ac — 3)(ac 一 5) + k{k — 3)(/c — 5) = 3(5 — k) 4 - 10(3 — k ). 


3(/c 2 - 8k * 

Rearrangement yields 


8/c 2 + 15« = 45 — 13/c. 


0 = k 3 — 5k 2 +4k = k(k— 1)(k —4), 

and k = 1 because it is the smallest positive root. 

Next, we determine Cramer } s asymptotic formula. Equation (8.22) be- 
comes, with M' x {k) = M^(l) = \ + 爭忐 =H ， 

c (il) (n) _ 32 

and thus Cramp’s asymptotic formula is ip(u) ^ ||e~ u , w oo. 

Equation (8.23) then becomes 


= (lTA-|) 


e 一 士咖 + l e - 


In order to compute a, we note that M^(t) = 2(3 — t)_ 3 + 警 (5 — 幻一 3 , 
and thus E(X 2 ) = ■⑼ = 嘉 + 警 (▲)= 黑 . The mean of the maximum 

aggregate loss is therefore 


dqi) (^)~6o- 


Equation (8.24) then yields 
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and so Tijms，approximation becomes ip T (v)= 备 e- 4u 
above, ip(v)= 矽 in this case also. 

It is not hard to see from (8.23) that ^ T {u) - Ce 一 ' if «< 1/a. 

In this situation, ^ T (u) will equal ip(u) when u = 0 and when u^ooos well 
as matching the compound geometric mean. It can be shown that a sufficient 
condition for the asymptotic agreement between t/> T (u) and ip(u) to hold as 
w —> oo is that the nonexponential claim size cumulative distribution function 
jP(ar) has either a nondecreasing or nonincreasing mean residual life function 
[which is implied if F(x) has a nonincreasing or nondecreasing hazard rate, as 
discussed in Section 4.3.3]. It is also interesting to note that > Ce^ KX 

in the former case and ^ T {x) < Ce~ KX in the latter case. See Willmot [137] 
for proofs of these facts. 

The following example illustrates the accuracy of Cramer’s asymptotic for¬ 
mula and Tijms 5 approximation. 

Example 8.21 (A gamma distribution with a shape parameter of 3) Suppose 
the claim severity distribution is a gamma distribution with a mean of 1 and 
density given by f{x) = 27x 2 e~ 3x /2 } x>0. Determine the exact ruin proba¬ 
bility, Cramers asymptotic ruin formula, and Tijms’ approximation when the 
relative security loading Q in each is 0.25.1, and 4, and the initial surplus u 
is 0.10,0.25,0.50,0.75, and 1. 

The moment generating function is Mx(t) = (1 一 f/3) 一 3 . 

The exact values of maybe obtained using the algorithm presented—in 

Exercise 8.17. That is, tp(u) = e- 3u ， £p ， oGj(Su) j /j\, u>0, where the <5^s 
may be computed recursively using 

= jTQj2 q ^- k + YTe £ j = 

1 十 P fc=l k=j+l 

with Co = 1/(1 + 0),ql =ql = qt = \, and q* k = 0 otherwise. The required 
values axe listed in Table 8.2 under the heading titled Exact. 

Cramer^ asymptotic ruin probabilities axe given by the approximation 
(8.21), with k, obtained from (8.3) numerically for each value of 0 using the 
Newton—Raphson approach described in Section 8.2.1. The coefficient C is 
then obtained from (8.22). The required values axe listed in Table 8.2 under 
the heading titled Cramer. 

Tijms 5 approximation is obtained using (8.23) with ex. satisfying (8.24), and 
the values are listed in Table 8.2 under the heading titled { Tijms. 5 

The values in the table, which may also be found in Tijms [129], p. 272 
and Willmot [137], demonstrate that Tijms’ approximation is an accurate 
approximation to the true value in this situation, particulaxly for small 0. 
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Table 8.2 Ruin probabilities with gamma losses 


Exact Cramer Tijms 


0.25 

0.10 

0.7834 

0.8076 

0.7844 


0.25 

0.7562 

0.7708 

0.7571 


0.50 

0.7074 

0.7131 

0.7074 


0.75 

0.6577 

0.6597 

0.6573 


1.00 

0.6097 

0.6103 

0.6093 

1.00 

0.10 

0.4744 

0.5332 

0.4764 


0.25 

0.4342 

0.4700 

0.4361 


0.50 

0.3664 

0.3809 

0.3665 


0.75 

0.3033 

0.3088 

0.3026 


1.00 

0.2484 

0.2502 

0.2476 

4.00 

0.10 

0.1839 

0.2654 

0.1859 


0.25 

0.1594 

0.2106 

0.1615 


0.50 

0.1209 

0.1432 

0.1212 


0.75 

0.0882 

0.0974 

0.0875 


1.00 

0.0626 

0.0663 

0.0618 


in Example 4.17), Tijms 5 approximate ruin probabilities are guaranteed to be 
smaller than Cramer^ asymptotic ruin probabilities, and this may be seen 
to be true from the table. It also follows that the exact values, Cramer^ 
asymptotic values, and Tijms’ approximate values all must converge as u 
oo, but the agreement can be seen to be fairly close even for u — 1. □ 

8.5.1 Exercises 

8.19 Show that (8.22) may be reexpressed as 

Q 

C= KE{Ye KY Y 

where Y has pdf f e (y). Hence prove for the problem of Exercise 8.17 that 
咖)〜糾- 〆 一仰， 一 
where ac > 0 satisfies 

1+0 = 
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8.20 Recall the function G{u,y) defined in Section 8.3. It can be shown using 
the result of Exercise 8.14(c) that Cramer’s asymptotic ruin formula may be 
generalized to 

G(u,y) 〜 C{y)e~ KU 1 u-^oo, 
where ‘. 

Mk/ 0 °° ^ Kt f y M 々 dxdt 

C(V)= ML M (1+ 0 厂 ' 

(a) Demonstrate that Cramer^ asymptotic ruin formula is recovered 
as y — oo. 

(b) Demonstrate using Exercise 8.13 that the above asymptotic for¬ 
mula for G(u,y) is an equality for all w in the exponential claims 
case with F(x) = 1 一 

8.21 Suppose that the following formula for the ruin probability is known to 
hold: 

矽⑻ =Cie- riU + C 2 e~ r2U , u>0, 
where C\ 一 0 , C 2 一 0 , and (without loss of generality) 0 < n < r 2 . 

(a) Determine the relative security loading 9. 

(b) Determine the adjustment coefficient k. 

(c) Prove that 0 < Ci < 1. 

(d) Determine Cramer 5 s asymptotic ruin formula. 

(e) Prove that ^(u) = ♦(u), where is. Tijms，approximation to 

the ruin probability. 

8.22 Suppose that 0 = 1 and the claim size density is given by f(x )= 
(1 + 6x)e~ Sx , x>0. 

(a) Determine Cramer^ asymptotic ruin formula. 

(b) Determine the ruin probability 

8.23 Suppose that 0 = 吾 and the claim size density is given by f(x) = 
2e~ 4x + |e- 7x , x>0. 

(a) Determine Cramer^ asymptotic ruin formula. 

(b) Determine the ruin probability •(u). 

8.24 Suppose that 0 = | and the claim size density is given by f(x) = 
3e- 4x -i- \e~ 2x , x> 0 . 

(a) Determine Cramer’s asymptotic ruin formula. 

(b) Determine the ruin probability 

8.25 Suppose that 0 = 1 and the claim size density is the convolution of two 
exponential distributions given by f(x) = Jq 3e~ s ^ x ~~ y ^2e~ 2y dy, x>0. 

(a) Determine Cramer 5 s asymptotic ruin formula. 

(b) Determine the ruin probability ip[u). 
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Time 


Fig. 8.2 Sample path for a Poisson surplus process. 

8.6 THE BROWNIAN MOTION RISK PROCESS 

In this section, we study the relationship between Brownian motion (the 
Wiener process) and the surplus process {Z7 t ; t > 0}, where 

U t = u + ct-S u t>0 (8.25) 

and {St] t > 0} is the total loss process defined by 

St = Xi + X 2 + • •. + Xjv t5 t>0^ 

where {N t ] t > 0} is a Poisson process with rate A and St =0 when Nt = 0. 
As eaxlier in this chapter we assume that the individual losses {Xi,X2,. • •} 
are independently and identically distributed positive random variables whose 
moment generating function exists. The surplus process {Ut\ t 0} increases 
continuously with slope c, the premium rate per unit time, and has succes¬ 
sive downward jumps of {Xi,X2, •••} at random jump times {Ti, •••}，as 
illustrated by Figure 8.2. In that figure, u = 20, c = 35, A = 3, and X has an 
ential distribution with mean 10. 

Z t = U t -u = ct-S u t>0. (8.26) 

Then Zo = 0. Because St has a compound distribution, the process {Z t \ t > 
0} has mean 

E(Z t ) = ct-E(S t ) 

=ct-XtE(X) 

and variance 

Vax(Zt) = AtE(X 2 ). 


We now introduce the corresponding stochastic process based on Brownian 
motion. 

Definition 8.22 A continuous-time stochastic process {Wt\ t>0} is a Brown¬ 
ian motion process if 

. 1. Wo = 0; 

2. {W t , t>0} has stationary and independent increments; and 

3. for every t > 0, W t is normally distributed with mean 0 and variance 
a 2 t. 


The Brownian motion process, also called the Wiener process or white 
noise, has been used extensively in describing various physical phenomena. 
When a 2 = 1, it is called standard Brownian motion. The English 
botanist Robert Brown discovered the process in 1827 and used it to describe 
the continuous irregular motion of a particle immersed in a liquid or gas. In 

1905 Albert Einstein explained this motion by postulating perpetual collision 

of the particle with the surrounding medium. Norbert Wiener provided the 
analytical description of the process in a series of papers beginning in 1918. 
Since then it has been, used in many areas of application from quantum me¬ 
chanics to describing price levels on the stock market. It has become the key 
model underpinning modern financial theory. 

Definition 8.23 A continuous-time stochastic process {Wt ； t >0} is called 
a Brownian motion with drift process if it satisfies the properties of a 
Brownian motion process except that Wt has mean fit rather than 0, for some 
M > 0. 

A Brownian motion with drift is illustrated in Figure 8.3. This process has 
w = 20, /x = 5, and a 2 = 600. The illustrated process has an initial surplus of 
20, so the mean of Wt is 20 + 5t. Technically, Wt — 20 is a Brownian motion 
with drift process. 

We now show how the surplus process (8.26) based on the compound Pois¬ 
son risk process is related to the Brownian motion with drift process. We 
will take a limit of the process (8.26) as the expected number of downward 
jumps becomes large and simultaneously the size of the jumps becomes small. 
Because the Brownian motion with drift process is characterized by the infini¬ 
tesimal mean 弘 and infinitesimal variance cr 2 , we force the mean and variance 
functions to be the same for the two processes. In this way, the Brownian 
motion with drift can be thought of as an approximation to the compound 
Poisson-based surplus process. Similarly, the compound Poisson process can 
be used as an approximation for Brownian motion. 

Let 

fi = c-XE[X] 









denote the infinitesimal mean and variance of the Brownian motion with, drift 



Then, in order for A —> oo, we let a —^ 0. 

Because the process {St ； t > 0} is a continuous-time process with sta¬ 
tionary and independent increments, so are the processes {?7t ； t > 0} and 
{Zt\ t > 0}. This will then also be the case for the limiting process. Because 
Zo = 0, we only need to establish that for every t, in the limit, Z t is normally 
distributed with mean /xt and variance aH according to Definitions 8.22 and 


8.23. We do this by looking at the moment generating function of Z t . 


M Zt (r) = M ct - St (r) 

=E{exp[r(ci - S t )]} 

=exp(t{rc + A[Mx(-r) - 1]}). 


=rc + 入 [Mx (— r) 一 1] 

=r[M + AE(X)] 

+A [1 - rE(X) + ^|-E(X 2 ) - &E(X 3 ) + • • 
=rfi + - A &E(X 3 ) - ^j*E(X 4 ) 


r/x + —a 2 — Act 2 


=rfJL + ^o 3 


03 


Because all terms 丨 


are fixed, as a 0, we have 


which is the mgf of the normal distribution with mean fit and a 2 t. This 
establishes that the limiting process is Brownian motion with drift. 

From Figure 8.2, it is clear that the process U t is differentiable everywhere 
except at jump points. As the number of jump points increases indefinitely, 
the process becomes nowhere diflFerentiable. Another property of a Brown¬ 
ian motion process is that its paths are continuous functions of t with 
probability 1. Intuitively, this occurs because the jump sizes become small 
as a 0. 

Finally, the total distance traveled in (0, t] by the process Ut is 


D = ct 七 St 

=■ ct + Xi . 


which has expected value 


=ct + AtE[X] 
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This quantity becomes indefinitely large as a — 0. Hence, we have 
lim E[D] = oo. 

This means that the expected distance traveled in a finite time interval is infi- 
xutely large! For a more rigorous discussion of the properties of the Brownian 
motion process, the text by Karlin and Taylor [71], Ch. 7 is recommended. 

Because Zt = Ut^ u, we can just add u to the Brownian motion with drift 
process and then use (8.27) and (8.28) to develop an approximation for the 
process (8.26). Of course, the larger the value of A and the smaller the jumps, 
the* better will be the approximation. For a very large block of insurance 
policies (for example, for an entire company), this may be appropriate. In 
this case, the probability of ultimate ruin and the distribution of time until 
ruin are easily obtained from the approximating Brownian motion with drift 
process. This is done in the next section. Similarly, if a process is known to 
be Brownian motion with drift, a compound Poisson surplus process can be 
used as an approximation. 


8.7 BROWNIAN MOTION AND THE PROBABILITY OF RUIN 

Let {Wu t > 0} denote the Brownian motion with drift process with, mean 
function (xt and variance function a 2 t. Let Ut =u + Wt denote the Brownian 
motion with drift process with initial surplus Do = u. 

We consider the probability of ruin in a finite time interval (0,r) as well as 
the distribution of time until ruin if ruin occurs. Let T = min t>0 {* : U t <0} 
be the time at which ruin occurs (with T = oo if ruin does not occur). Letting 
r —^ oo will give ultimate ruin probabilities. 

The probability of ruin before time r can be expressed as 

矽 (w ， r)= 




1 一咖 ， ?■) 

Pr{T < r} 

Pr Ut < 1 

PH min W t < —Uo ! 




Pr« 


Wt<-u\ 


Theorem 8.24 For the process U t described above，the ruin probability is 
given by 

♦ ， r) = $(-^^) + 卿 ( 去) $(-^^) ， （ 8.29) 

where $(•) is the cdf of the standard normal distribution. 


■■IMS 

IliKWBIiMiTiTfTOT 疆 Trill 


Time 


Fig. 8.4 Type B sample path and its reflection. 

Proof: Any sample path of Ut with a final level U T < 0 must liav 
crossed the barrier C4 = 0 at some time T < r. For any such path l 
define a new path which is the same as the original sample path 
t < T but is the reflection about the barrier Z7 f = 0 of the original s 
path for all t > T, Then 

(U u t< T, 

U；^l : - 

[ -Ut, t>T. 

The reflected path C/ t * has final value U* = —U r - These axe illustra 
Figure 8.4, which is based on the sample path in Figure 8.3. 

Now consider any path that crosses the barrier {7* = 0 in the time ii 
(0,r). Any such path is one of two possible types: 

Type A: One that has a final value of TJ r < 0. 

Type B: One that has a final value of U T > 0. 

Any path of Type B is a reflection of some other path of Type A.] 
sample paths can be considered in reflecting pairs. The probability of i 
some time in (0,r) is the total probability of all such pairs: 

^){u, r) — Pr{T < r} = Pr | o nun^ U t <0 

where it is understood that all probabilities are conditional on Uq =n 
probability is obtained by considering all original paths of Type A wit 
values U T = x < 0. By adding all the corresponding reflecting paths 
well, all possible paths that cross the ruin barrier are considered. Nol 
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the case C/ T = 0 has been left out. The probability of this happening is zero 
and so this event can be ignored. In Figure 8.4 the original sample path ended 
at a positive surplus value and so is of Type B. The reflection is of Type A. 

Let A x and B x denote the sets of all possible paths of Types A and B 
respectively, which end &t U T = x for Type A and U r = —x for Type B. 
Further let Pr{A x } and Pr{B x } denote the total probability associated with 
the paths in the sets. 6 Hence the probability of ruin is 

Pr{T <r} = f ^ Pr{^ = dx. (8.31) 

Because A x is the set of all possible paths ending at x, 

Fv{A x } = Pi{U T = x}, 
where the right-hand side is the pdf of U T . Then 

Pr{T <r} = j: Pv{U r = x}[l + (8-32) 

Because U t ~ u Is a. Brownian motion with drift process, U T is normally 
distributed with mean u-b fj/r and variance a 2 r, and so 

Pr{?7 T =4 = (27ra 2 r)- 1 / 2 exp ^- (X ""C -j - 

To obtain. Pr^j/Pr^s}, we condition on aJl possible ruin times T. Then 










f U T — Ut because 
increments and so 
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U T — Ut has a normal distribution. Then, 


Pr{C7 r = x\T = t }= 


Pv{U r -U t = x} 

exp l 2crHr-t) / 

*y/27rcr 2 (r — t) 

f x 2 — 2xfi(r -1) + h 2 (t - t) 2 
6XP \ 2(7 2 (r-t) 

y/27ra 2 (r - 1) 

fxn\ 6XP i __2^(r-t) / 

eXP V^J —~ ^2na^T-t) —~ ‘ 


Similarly, by replacing x with —x, 










By letting r —> cx>, this result follows from Theorem 8.24. It should be 
noted that the distribution (8.29) is a defective distribution because the 
cdf does not approach 1 as r —> oo. The corresponding proper distribution is 
obtained by conditioning on ultimate ruin. 


Corollary 8.26 The distribution of the time until ruin given that it occurs 


and cdf [from (8.34)] 
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This distribution is the one-sided stable law with index 1/2. 

These results can be used as approximations for the original surplus process 
(8.26) based on the compound Poisson model. In this situation c = (1 + 
0)AE(X) where 6 is the relative premium loading. We do this by substituting 
back using (8.27) and (8.28). 

Then, for the process (8.26) from (8.29), (8.33), and (8.35), we have 


♦[u, r) = $ 


i + OXrEjX) 

VArE(X 2 ) 


二 9XrE{X) 

两 


[u-d\rE(X)f 


V 謂(可 I 2AtE (巧厂 

Then, for any compound Poisson-based process, it is easy to get simple nu¬ 
merical approximations. For example, the expected time until min，given that 
it occurs, is 
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drift and variance parameters fx and a 2 , one can use (8.27) and (8.28) to 
obtain 


fi = 9XE{X), (8.37) 

a 2 = XE{X 2 ). (8.38) 

It is convenient to fix the jump sizes so that E(X) = k, say, and E(X 2 ) = k 2 . 
Then we have 



a 2 


豆， 


(8.39) 

(8.40) 


When fi and a 2 axe fixed, choosing a value of k fixes 入 and 0. Hence, the 
Poisson-based process can be used to approximate the Brownian motion with 
accuracy determined only by the parameter fc. The smaller the value of fc } the 
smaller the jump sizes and the larger the number of jumps per unit tune. 

Hence, simulation of the Brownian motion process can be done using the 
Poisson-based process. To sirtmlate tlie Poisson-based process，the waiting 
times between successive events are generated first because these axe expo¬ 
nentially distributed with mean 1/A. As k becomes small, A becomes large 
and the mean waiting time becomes small. 


Part III 


Construction of 
empirical models 









Review of 
mathematical statistics 


9.1 INTRODUCTION 

Before studying empirical models and then parametric models, we review some 
concepts from mathematical statistics. Mathematical statistics is a broad sub¬ 




statistics texts and courses，it receives more m-aepia coverage m ⑽山 
Section 12.4. Bayesian methodology also provides the basis for the credibility 
methods covered in Chapter 16. 

To see the need for methods of statistical inference, consider the case where 
your boss needs a model for basic dental payments. One option is to simply 
announce the model. You announce that it is the lognormal distribution with 


= 5.1239 and a = 1.0345 (the many decimal places 
ur announcement an aura of precision). When your be 
attorney who has put you on the witness stand asks 
at t.n Kp Rn it will likftlv not be sufficient to answer thi 










This is an important point with regard to actuarial practice. What is im¬ 
portant is that an appropriate procedure be used, with everyone understand¬ 
ing that even the best procedure can lead to a poor result once the random 
future outcome has been revealed. This point is stated nicely in a Society of 
Actuaries principles draft ([124], pp. 779-780) regarding the level of adequacy 
of a provision for a block of life risk obligations (that is, the probability that 
the company will have enough money to meet its contractual obligations): 
The indicated level of adequacy is prospective, but the actuarial model 
is generally validated against past experience. It is incorrect to con¬ 
clude on the basis of subsequent experience that the actuarial assump- 


an estimator. Three of them axe discussed here. Two examples will be used 
throughout to illustrate them. 

Example 9.1 A population contains the values 1, 3, 5, and 9. We want to 
estimate the population mean by taking a sample of size 2 with replacement 

Example 9.2 A population has the exponential distribution with a mean of 
0. We want to estimate the population mean by taking a sample of size 3 with 
replacement 

Both examples are clearly artificial in that we know the answers prior to 





































e be the quantity we want to estimate. Let 0 be the random variable that 
represents the estimator and let E(fl|0) be the expected value of the estimator 
0 when 6 is the true parameter value. 

Definition 9.3 An estimator, 0, is unbiased if E{6\6) = 0 for all 6. The 
bias is bias^(0) = E(0|0) — 9. 

The bias depends on the estimator being used and may also depend on the 
particular value of 0. 

Example 9.4 For Example 9.1 determine the bias of the sample mean as an 
estimator of the population mean. 

The population mean is 0 = 4.5. The sample mean is the average of the two 
observations. It is also the estimator we would use employing the empirical 
approach. In all cases, we assume that sampling is random. In other words, 
every sample of size n has the same chance of being drawn. Such sampling 
also implies that any member of the population has the same chance of being 
observed as any otter member. For this example, there axe 16 equally likely 
ways the sample could have turned out. They axe listed below. 


1.1 1,3 1,5 1,9 

5.1 5,3 5,5 5,9 

3.1 

9.1 

3.3 3,5 

9.3 9,5 

3.9 

9.9 

This leads to the following 16 equal 

ly likel ： 

Y values for the sample mean: 

12 3 5 

2 

3 4 

6 

3 4 5 7 

5 

6 7 

9 


Combining the common values, the sample mean, usually denoted X, has 
the following probability distribution: 


The sample mean is X = (Xi + X 2 + X 3 )/3, where each Xj represents one 
of the observations from the exponential population. Its expected value is 

E(X) - E ^ + = I [E(Xl) + e(X 2 ) + E(X 3 )} 

=+ 0 + 0 ) = 0 

and therefore the sample mean is an unbiased estimator of the population 
mean. 

Investigating the sample median is a. bit more difficult. The distribution 
function of the middle of three observations can be found as follows, using 
Y as the random variable of interest and X as the random variable for an 
observation from the population: 

Fy{y) — Pr(y <y) = Pt{Xi,X 2 ,Xs < y) 4 - Px(Xi,X 2 < > y) 

+ Pi(X u X s <y,X 2 >y) + Pr(X 2 ,X 3 <y,X 1 >y) 

=F x (y) s ^dF x {y) 2 [l~F x (y)) 

=[1- e i/ 0 ] 3 + 3[1 - e - y/e fe- y/e . 

The density function is 

fY{y) = F^y) = - Q {e-^ B -e-^ s ). 

The expected value of this estimator is 

. E(y|0)= 厂為 {e- 2 y ， e -e-_、dy 

50 

=— . 

This estimator is clearly biased, 1 with biasy(0) = 59/6 — 9 = 一 0/6. On 
average, this estimator underestimates the true value. It is also easy to see that 





Let y„ be the maximum from a sample of size n. Then 

FyAv) 

- fYniy) 

The expected value is 

E(y„|0) = f ny-e- n dy= = 

As n — oo, the limit is 9, making this estimator asymptotically unbiased. 口 

9.2.23 Consistency A second desirable property of an estimator is that it 
works well for extremely large samples. Slightly more formally, as the sample 
size goes to infinity, the probability that the estimator is in error by more 
than a small amount goes to zero. A formal definition follows. 

Definition 9.8 An estimator is consistent (often called, in this context, 
weakly consistent) if, for all 8 > 0 and any 6, 

lim Pr(|0„ — 0| > 5) = 0. 

n—*oo 

A sufficient (although not necessary) condition for weak consistency is that 
the estimator be asymptotically unbiased and Var(0 n ) — *■ 0. 


=Pr(Y„ <y) = Pr(_Xi Sy ， ..”X n <y') 
m [Fx{y)] n 
(y/o) n 


pie 9.9 Prove that, if the variance of a random variable is finite, the 
: mean is a consistent estimator of the population mean. 











272 REVIEW OF MATHEMATICAL STATISTICS 


Example 9.12 Consider the estimator 9 = 5 of an unknown parameter 0. 
The MSE is (5 — 0) 2 , which is very small when 0 is near 5 but becomes poor 
f or other values. Of course this estimate is both biased and inconsistent unless' 
9 is exactly equal to 5. 

A result that follows directly from the various definitions is 

MSEq{6) = E{[0 - E(0|0) + E(0|0) - 0] 2 \6} = Var(0|0) + [bias§(0)] 2 . (9.1) 

If we restrict attention to only unbiased estimators，the best such could be 
defined as follows. 

Definition 9.13 An estimator, 9, is called a uniformly minimum vari¬ 
ance unbiased estimator (UMVUE) if it is unbiased and for any true 
value of 6 there is no other unbiased estimator that has a smaller variance. 

Because we axe looking only at unbiased estimators, it would have been 
—ally effective to make the definition in terms of MSE. We could also gen- 
eraUze the definition by looking for estimators that axe uniformly best with 
regard to MSE, but the previous example indicates why that is not feasible. 
There are a few theorems that can assist with the determination of UMVUEs. 
However, such estimators are difficult to determine. On the other hand, MSE 
. .•it 汽 ” : twn altftma.tive estimators. 


mple mean has 


Vax(X) _ e 2 
3 = "3 

.tiplied by 1.2, the sample median 


iplied by 1.2, the sample median has second moment 
2Yf] = 1.44 J^° y 2 ^ (e~ 2y/e - e~ 3y/e ^ dy 

= 1_44 化 ( 夺 - 2 叫叫 

+ 2 (4， /0 4， /e : 


V 8 

l /26 3 2g 3 \ 380 2 

'VT ~~27 J = ~25~ 


e 9.1, show that the mean of three observations drawn without 
i unbiased estimator of the population mean while the median 
bions drawn without replacement is a biased estimator of the 
i. 

for random samples the sample mean is always an unbiased 
population mean. 
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9.5 For the sample of size 3 in Exercise 9.3, compare the MSE of the sample 
mean and median as estimates of 6. 

9 6 (*) You are given two independent estimators of an unknown quantity 6. 
For estimator A, E(d A ) = 1,000 and Vax(^) = 160,000, while for estimator 
B E0b) = 1.200 and Vai{h) = 40,000. Estimator C is a weighted average, 
ec = w9 A + (1- w) 9 b . Determine the value of that minimizes Var(0 c ). 

9 7 (” A population of losses has the Paieto distribution (see Appendix 
A) with 0 = 6,000 and a unknown. Simulation of the results 
likelihood estimation based on samples of size 10 has indicated that E ⑷ - 2.2 
and MSE(a) = 1. Determine Var(a) if it is known that a = 2. 

9.8 (*) Two instruments are available for measuring a partio^ nonzero 
distance The random variable X represents a measurement with the brst 
==nt，and the randon, variable F with the sec ond ^ent. 

X and 7 axe independent with E(X) = 0.8m, E(Y) = m, Vsx{X) = f = 
Varm = 1.5m 2 , where mis the true distance. Consider estimators of m that 
axe of the form Z = aX + /3Y. Determine the values of a and ^ that make Z 
a UMVIJE within the class of estimators of this form. 

99 A 卿 ulation contains six members, with values 1, 1, 2, 3, 5, and 10. 
A random sample of size 3 is dr 讓 without replacement. In case 
objective is to estimate the population mean. Note: A spreadsheet with 
optimization routine may be the best way to solve this problem. 

( a ) Determine the bias, variance, and MSE of the sample mean. 

(b) Determine the bias, variance, and MSE of the sample median. 

( C ) Determine the bias, variance, and MSE of the sample midrange 
(the average of the largest and smallest observations). 

(d) Consider an arbitrary estimator of the form aX {1) + bX( 2 ) + cX^), 
where X (1) < X {2 ) < X (3)j axe the sample order statistics. 

i. Determine a restriction on the values of a, b, and c that wUl 
assure that the estimator is unbiased. 

ii. Determine the values of a, b, and c that will produce the un¬ 
biased estimator with the smallest variance. 

iii. Determine the values of a, &, and c that wiU produce the (pos¬ 
sibly biased) estimator with the smallest MSE. 

9.10 (*) Two different estimators, h and d 2 , are being coMidered. To test 
their performance, 75 trials have been simulated, each with the true value se 
at 0 = 2. The following totals were obtained: 

ye tj = 165 , = 375 , = 147 , = 312 , 

^ j=i i =1 j=1 


where % is the estimate based on the jth. simulation using estimator 6i- 
Estimate the MSE for each estimator and determine the relative efficiency 
(the ratio of the MSEs). 

9.3 INTERVAL ESTIMATION 

AJ1 of the estimators discussed to this point have been point estimators. 
That is, the estimation process produces a single value that represents our 
best attempt to determine the value of the unknown population quantity. 
While that value may be a good one, we do not expect it to exactly match 
the true value. A more useful statement is often provided by an interval 
estimator. Instead of a single value, the result of the estimation process is 
a range of possible numbers, any of which, is likely to be the true value. A 
specific type of interval estimator is the confidence interval. 

Definition 9.16 A 100(1 - a)% confidence interval for a parameter 6 is 
a pair of values L and U computed from a random, sample such that Pr(L < 

6 <U)>l-a for all Q. 

Note that this definition does not uniquely specify the interval. Because the 
definition is a probability statement and must hold for all 9, it says nothing 
about whether or not a particular interval encloses the true value of 6 from a 
particular population. Instead, the level of confidence, 1-a, is a property of 
the method used to obtain i and J7 and not of the particular values obtained. 
The proper interpretation is that, if we use a particular interval estimator 
over and over on a variety of samples, at least 100(1 a)% of the time our 

interval will enclose the true value. 

Constructing confidence intervals is usually very difficult. For example, we 
know that, if a population has a normal distribution with unknown mean and 
variance, a 100(1 - a)% confidence interval for the mean uses 

L = X- t a/2 ,n-is/y/n, U = X + t a/2 , n -is/y/n, (9.2) 

where s = - 1) — ^/ 2 ,b is the 100(1 - a/2)th per¬ 

centile of the t distribution with b degrees of freedom. But it tates a great 
deal of effort to verify that this is correct (see, for example, [58] ， p. 214). 

However, there is a method for constructing approximate confidence inter¬ 
vals that is often accessible. Suppose we hayea point estimator 6 of paxameter 
6 such that E0) = 9, Vax(0) = v{6), and 6 has approximately a normal dis¬ 
tribution. Theorem 12.13 shows that this is often the case. With all these 
approximations, we have that approximately 

l-a = Pr^ Q/2 <-^<^, (9.3) 
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where z a /2 is the 100(1—a/2)th percentile of the standard normal distribution. 
Solving for 9 produces the desired interval. Sometimes this is difficult to do 
(due to the appearance of 6 in the denominator) and so, if necessary, replace' 
v(6) in (9.3) with v(6) to obtain a further approximation, 


：=Pr ^ - z a/2 yfAp) <0<0 + 2： Q/ 2 - (9-4) 


Example 9.17 Use (9.4) to construct an approximate 95% confidence inter¬ 
val for the mean of a normal population with unknown variance. 

Use 9 = X and then note that E(0) = 0, Var(0) = <^ 2 /n, and 0 does have a 
normal distribution. The confidence interval is tlien X 土 1.96s/y/n. Because 
to 25 ,n-i > 1-96, this approximate interval must be narrower than the exact 
interval given by (9.2). That means that our level of confidence is something 
less than 95%. 口 


Example 9.18 Use (9.3) and (9.4) to construct approximate 95% confidence 
intervals for the mean of a Poisson distribution. Obtain intervals for the 
particular case where n = 25 and x = 0.12. 

Let 0 = X. the sample mean. For the Poisson distribution. E(0) = E(X)= 
0 and v{0) = Vax(X) = Var(X)/n = 0/n. For the first interval 


/ 文 一 Q > 

0.95 = Pr (-1.96 < ^== < 1.96 


is true if and only if 


which is equivalent to 


( 文一心 ㈣ 


0 2 -^2X + ^^)+X 2 <O. 
Solving the quadratic produces the interval 


^ t 1.9208 ± 1 /15.3664X + 3.8416 2 /n 
n 2 V n 

and for this problem the interval is 0.197 士 0.15$. _ 

For the second approximation the interval is 尤士 1.96^/J?7n and for the ex¬ 
ample it is 0.12 土 0.136. This interval extends below zero (which is not possible 
for the true value of 6). This is because (9.4) is too crude an approximation 
in this case. D 
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9.3.1 Exercise 

9.11 Let … be a random sample from a population with pdf f(x )= 
q-Yq-x/o 、 x 〉 0. This exponential distribution has a mean of 0 and a variance 
of 0 2 . Consider the sample mean, X, as an estimator of 9. It turns out that 
X j6 has a gamma distribution with a = n and 0 = 1/n, where in the second 
expression the il 0 ,1 on the left is the parameter of tlie gamma distribution. For 
a sample of size 50 aaid a sample mean of 275, develop 95% confidence intervals 
by each of the following methods. In each case, if the formula requires the 
true value of 0, substitute the estimated value. 

(a) Use the gamma distribution to determine an exact interval. 

(b) Use a approximation, estimating the variance prior to solv- 

ing the inequalities. 

(c) Use a normal approximation, estimating 0 after solving the inequal- 
ities. 


9.4 TESTS OF HYPOTHESES 

Hypothesis testing is covered in detail in most mathematical statistics texts. 
This review will be fairly straightforward and will not address philosophical 
issues or consider alternative approaches. A hypothesis test begins with two 
hypotheses, one called the null and one called the alternative. The traditional 
notation is H 0 for the null hypothesis and H x for the alternative hypothesis. 
The two hypotheses are not treated symmetrically. Reversing them may alter 
the results. To illustrate this process, a simple example will be used. 

Example 9.19 Your company has been basing its premiums on an assump¬ 
tion that the average claim is 1,200. You want to raise the premium and a 
regulator has insisted that you provide evidence that the average now exceeds 
1,200. To provide such evidence, the following numbers have been obtained. 
What are the hypotheses for this problem? 

h """"^U5~126~~155 161 2^ 294 340 384 

457 680 855 877 974 1,193 1,340 1,884 2,558 15,743 


Let ^ be the population mean. One hypothesis (the one you claim is true) is 
that n > 1,200. Because hypothesis tests must present an either/or situation, 
the other hypothesis must befi< 1,200. The only remaining task is to decide 
which of them is the nuU hypothesis. Whenever the universe of continuous 
possibilities is divided in two there is likely to be a boundary that needs to 
be assigned to one hypothesis or the other. The hypothesis that includes 
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the boundary must be the null hypothesis. Therefore, the problem can be 
succinctly stated as: . 

Ho : H < 1,200 

Hi : n> 1 , 200 . □ 


The decision is made by calculating a quantity called a test statistic. It 
is a function of the observations and is treated as a random variable. That is, 
in designing the test procedure we are concerned with the samples that mignt 
have been obtained and not with the particular sample that was obtained. 
The test specification is completed by constructing a rejection region. It 
is a subset of the possible values of the test statistic. K the value of the test 
statistic for the observed sample is in the rejection region, the null hypothesis 
is rejected and the alternative hypothesis is announced as the result that is 
supported by the data. Otherwise，the null hypothesis is not rejected (more 
on this later). The boundaries of the rejection region (other than plus or 
minus infinity) are called the critical values. 

19 continued) Complete the test using the test 


hat the population has a normal distribution with standard deviation 3,435. 


The traditional 


c for this problem is 
: 3:435)!^ = 0 . 292 


md the null hypothesis is rejected if ^ > 1.645. Because 0.292 is less thai 
1.645, the null hypothesis is not rejected. The data do not support the asse^ 
bion that the average claim exceeds 1,200. Li 

The test in the previous example was constructed to meet certain objec¬ 
tives. The first objective is to control what is called the Type I error. It is th 


happens to 


Example 9.22 Determine the level of significance for the test in Example 
9.20. 


Begin by computing the probability of making a Type I error when the null 
hypothesis is true with fi = 1,200. Then, 

Pr(Z> lM5\fi = 1,200) = 0.05. 

That is because the assumptions imply that Z has a standard normal distri¬ 
bution. 

Now suppose /i has a value that is below 1,200. Then 


3,435/V^O 


„ ( X-il 、 ， M — 1 , 200 、 

=P A3,435/^ > - 3,435/^20 J ' 

Because " is known to be less than 1,200, the right-hand side is always greater 



test exists that has the same or lower significance level and for a particular 
value within the alternative hypothesis has a smaller probability of making a 
Type II error. 

Example 9.24 (Example 9.22 continued) Determine the probability of mak¬ 
ing a Type II error when the alternative hypothesis is true with fj, = 2,000. 


=Pr(X- 1,200 < 1,263.51|m = 2,000) 
=Pr(X< 2,463.51|m = 2,000) 

—( 尤 -2,000 2,463.51 - 2,000 _ 

=Pr V3,435/V20 < 3,435/V20 ' 
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For this value of /x, the test is not very powerful, having over a 70% chance of 
making a Type II error. Nevertheless (though this is not easy to prove), the 
test used is the most powerful test for this problem. □ ’ 

Because the Type II error probability can be high, it is customary to not 
make a strong statement when the null hypothesis is not rejected. Rather 
than say we choose or accept the null hypothesis, we say that we fall to reject 
it. That is, there was not enough evidence in the sample to make a strong 
argument in favor of the alternative hypothesis, so we take no stand at all. 

A common criticism of this approach to hypothesis testing is that the choice 
of the significance level is arbitrary. In fact, by changing the significance level, 
any result can be obtained. 

Example 9.25 (Example 9.24 continued) Complete the test using a signifi¬ 
cance level of a = 0.45. Then determine the range of significance levels for 
which the null hypothesis is rejected and for which it is not rejected. 

Because Pi{Z > 0.1257) = 0.45, the null hypothesis is rejected when 

X - 1 ^> 0.1257. 

- 3,435/^/20 

In this example, the test statistic is 0.292, which is in the rejection region, 
and thus the null hypothesis is rejected. Of course, few people would place 
confidence in the results of a test that was designed to make errors 45% of 
the time. Because Pi(Z > 0.292) = 0.3851, the null hypothesis is rejected for 
those who select a significance level that is greater than 38.51% and is not 
rejected by those who use a significance level that is less than 38.51%. 口 

Few people axe willing to make errors 38.51% of the time. Announcing this 
figure is more persuasive than the earlier conclusion based on a 5% significance 
level. When a significance level is used, readers axe left to wonder what the 
outcome would have been with other significance levels. The value of 38.51% 
is called a p-value. A working definition is: 

Definition 9.26 For a hypothesis test, the p-value is the probability that the 
test statistic takes on a value that is less in agreement with the null hypothesis 
than the value obtained from the sample. Tests conducted at a significance level 
that is greater than the p-value will lead to a rejection of the null hypothesis ， 
while tests conducted at a significance level that is smaller than the p-value 
will lead to a failure to reject the null hypothesis. 

Also, because the p-value must be between 0 and 1, it is on a scale that 
carries some meaning. Tlie closer to zero the value is, tlie more support the 
data give to the alternative hypothesis. Common practice is that values above 


10% indicate that the data provide no evidence in support of the alternative 
hypothesis, while values below 1% indicate strong support for the alternative 
hypothesis. Values in between indicate uncertainty as to the appropriate 
conclusion and may call for more data or a more careful look at the data or 
the experiment that produced it. 

9.4.1 Exercise 

9.12 (Exercise 9,11 continued) Test Ho • 0 > 325 vs Hi : 0 < 325 using 
a significance level of 5% and the sample mean as the test statistic. Also, 
compute the p-value. Do this using the exact distribution of the test statistic 
and a normal approximation. 
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10.1 INTRODUCTION 





















creases as trie numotr uj u>u,um ^ — 。 

Definition 10.2 A parametric distribution is a set of distribution func¬ 
tions, each member of which is determined by specifying one or more values 
called parameters. The number of parameters is fixed and finite. 

Here only two data-dependent distributions wUl be considered. They de¬ 
pend on the data in similar ways. The simplest definitions for the two types 
considered appear below. 

Definition 10.3 The empirical distribution is obtained by assigning prob- 
ability l/n to each data point 

Definition 10.4 A kernel smoothed distribution is obtained by replacing 
each data point with a continuous random variable and then assigning proba¬ 
bility l/n to each such random variable. The random variables used must be 
identical except for a location or scale change that is related to its associated 
data point 

- Note that the empirical distribution is a special type of kernel smoothed 
distribution in which the random variable assigns probability 1 to the data 
point. An alternative to the empirical distribution that ^ similar in spmt but 
produces different numbers wUl also be presented. In Chapter 11 it vnRhe 
shown how the definition can be modified to account for data that have bee 
altered through censoring and truncation. With regard to kernel smoo g, 
there axe several distributions that could be used, a few of which axe mtro- 

duced in Section 11.3. 4 ，- ^ ^ ,1 ___ 

Throughout this part, four examples wUl used repeatedly. Because they 
axe simply data sets, they will be referred to as Data Sets A, B, G and D. 

Data Set A This data set is well-known in the casualty actuarial literatur^ 
It was first analyzed in the paper [30] by Dropkm in 1959. He collected data 
from 1956-1958 on the number of accidents by one driver m one year, 
results for 94,935 drivers are in Table 10.1. 

Data Set B These numbers are artificial They represent the amounts paid 
on workers compensation medical benefits but are not related to any pariicular 
policy or set of policyholders. These payments are the fall amount of the loss. 
A random sample of 20 payments is given in Table 10.2. 




81,714 

11,306 

1,618 

250 

40 

7 


Table 10.2 Data Set B 


27 82 115 126 155 161 243 294 340 384 

457 680 855 877 974 1,193 1,340 1,884 2,558 15,743 


Table 10.3 Data Set C 


Payment range 

Number of payments 

0-7,500 

99 

7,500-17,500 

42 

17,500-32,500 

29 

32,500-67,500 

28 

67,500-125,000 

17 

125,000-300,000 

9 

Over 300,000 

3 


contract), and for the remainder it is expiration of the five-year period. Two 
separate versions are presented. For Data Set D1 (Table 10.4) there were 30 
policies observed from issue. For each，both the time of death and time of sur¬ 
render are presented，provided they were before the expiration of the five-year 
period. Of course, normally we do not know the time of death of policyholders 
who surrender and we do not know when policyholders who died would have 
surrendered had they not died. Note that the final 12 policyholders survived 
both death and surrender to the end of the five-year period. 

For Data Set D2 (Table 10.5), only the time of the first event is observed. 
In addition, there are 10 more policyholders who were first observed at some 
time after the policy was issued. The table presents the results for all 40 


0 

1 

2 

3 

4 

5 or more 
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is—a Step function with jumps at each data point. 

0/94,935 = 0.000000, 
81,714/94,935 = 0.860736, 
93,020/94,935 = 0.979828, 
F ai egg (x) = 94,638/94,935 = 0.996872, 

， 94,888/94,935 = 0.999505, 

94,928/94,935 = 0.999926, 
、 94,935/94,935 = 1.000000, 

For Data Set B, 

0.05, a: = 27， 
0.05 ，x = 82, 
0.05, x = 115, 


0.05. x = 15,743. 



10.2 THE EMPIRICAL DISTRIBUTION FOR COMPLETE, 
INDIVIDUAL DATA 

recorded. An alternative definition follows. 

Definition 10.5 The empirical distribution function is 

number of observations <x 
F^(x)- --- - - - ， 


where n is the total number of observations. 

Example 10.6 Provide the empirical probability functions for the data in 
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Example 10.8 Consider a data set containing the numbers 1.0, 1.3,1.5, 1.5, 
2.1，2.1，2.1, and 2.8. Determine the quantities described in the previous 
paragraph and then obtain the empirical distribution function. 

There are five unique values and thus fc = 5. Values of and Tj are 

given in Table 10.6. 


As noted in the example, the empirical model is a discrete distribution. 
Therefore, the derivative required to create the density and hazard rate func¬ 
tions cannot be taken. The closest we can come, in an empirical sense, to 
estimating the hazard rate function is to estimate the cumulative hazard rate 
function, defined as follows. 

Definition 10.7 The cumulative hazard rate function is defined as 
H{x) = -]nS(x). 

The name comes from the fact that, if S(x) is differentiable. 


H(x) = / h(y)dy. 


The distribution function can be obtained from F(x) = 1 — S(x) = 1 — e- 孖⑷ • 
Therefore, estimating the cumulative hazard function provides an alternative 
way to estimate the distribution function. 

In order to define empirical estimates, some additional notation is needed. 
For a sample of size n, let i/i < 2/2 < • * * < Vk be the k unique values that 
appear in the sample, where k must be less than or equal to n. Let Sj be the 
number of times the observation yj appears in the sample. Thus, Y^;i s j = n * 
Also of interest is the number of observations in the data set that are greater 
than or equal to a given value. Both the observations and the number of 
observations are referred to as the risk set. Let Tj = s i the number 
of observations greater than or equal to yj. Using this notation, the empirical 
distribution function is 





Rpcausethisisastep function, its derivatives (which would proviae an e 5U - 
ma teofthe hazard rate function) axe not interesting. An mtmtive denvatio 
of this estimator can be found on page 302. 

Example 10.10 Determine the Nelson-Aalen estimate for the previous ex- 
ample. 


We have 

* 0, ® < i.o ， 

| = 0.125 ， 1.0 < x < 1.3, 

0.125 + i = 0.268, 1.3 < a: < 1.5, 
H{x) = 0.268 + § = 0.601, 1.5 < a; < 2.1, 

0.601 + | = 1-351, 2.1 < a ： <2.8, 
1.351 + \ = 2.351, x > 2.8. 



. r or tnis paxticuiar pro Diem, wnere n is jmuwu unau au puuLuyjLiuMiciD ucj-iou.- 
nate at time 5, results past 5 axe not interesting. The methods of obtaining 
an empirical distribution that have been introduced work only when the indi¬ 
vidual observations are available and there is no truncation or censoring. The 
following chapter will introduce modifications for those situations. 

10.2.1 Exercises 

10.1 Obtain the empirical distribution function and the Nelson-Aalen esti¬ 
mate of the distribution function for the time to surrender using Data Set Dl. 
Assume that the surrender time is known for those who die. 

10.2 The data in Table 10.8 are from Loss Distributions [59], p. 128. It 
represents the total damage done by 35 hurricanes between the yeaxs 1949 
and 1980. The losses have been adjusted for inflation (using the Residential 
Construction Index) to be in 1981 dollars. The entries represent all hurricanes 
for which the trended loss was in excess of 5,000,000. 







(a) Estimate the mean, standard deviation, coefficient of variation, and 
skewness for the population of hurricane losses. 


(b) Estimate the first and second limited moments at 500,000,000. 


Determine the empirical skewness coefficient. 


10.3 EMPIRICAL DISTRIBUTIONS FOR GROUPED DATA 
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but whatever approach is used, consistency in assignment of observations to 
groups should be used. You will note that in Data Set C it is not possible to 
tell how the assignments were made. If we had that knowledge, it would not 
affect any subsequent calculations. 

Definition 10.12 For grouped data, the distribution function obtained by 
connecting the values of the empirical distribution function at the group bound¬ 
aries with straight lines is called the ogive. The formula is 

Fn(x) = - —Pn{Cj-l) + - - Cj-1 <x< Cj. 

Cj — Cj-i Cj — Cj-i 

This function is differentiable at all values except group boundaries. There¬ 
fore the density function can be obtained. To con^letely specify the density 
function, it will arbitrarily be made right-continuous. 

Definition 10.13 For grouped data, the empirical density function can be 
obtained by differentiating the ogive. The resulting function is called a his¬ 
togram. The formula is 


Pn{ c j) ~ Fn{ c j—l ) — 


:> Cj-i < x < c i- 


Many computer programs that produce histograms actually create a bar 
chart with bar heights proportional to nj/n. This is acceptable if the groups 
have equal width, but if not, then the above formula is needed. The ad¬ 
vantage of this approach is that the histogram is indeed a density function, 
and among other things, areas under the histogram can be used to obtain 
empirical probabilities. 


Example 10.14 Construct the ogive and histogram for Data Set C. 


The distribution function is 


• 0.000058150a:, 

0.29736 + 0.000018502a;, 
0.47210 + 0.000008517a;, 
F 2 27 (x) = 0.63436 + 0.000003524a;, 

0.78433 + 0.000001302s, 
0.91882 + 0.000000227a:, 
undefined, 


0 <x< 7,500, 

7.500 <x< 17,500, 

17.500 <x< 32,500, 

32.500 <i< 67,500, 

67.500 <x< 125,000, 
125,000 <x< 300,000, 
x> 300,000, 


where, for example, for the range 32,500 <x< 67,500 the calculation is 


67,500 - x 170 x - 32,500 198 

心 ⑷ = 67,500-32,500 227 + 67,500-32,500 227 


The value is undefined above 300,000 because the last interval has a width of 
infinity. A graph of the ogive for values up to 125,000 appears in Figure 10.1. 
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F/g". 10.2 Histogram of general liability losses. 


The derivative is simply a step function with the following values. 



0.000018502, 

0.000008517, 

0.000003524, 

0.000001302, 


0.000000227, 


0<x< 7,500, 

7.500 < x < 17,500, 

17.500 <x< 32,500, 

32.500 <x< 67,500, 

67.500 < a: < 125,000 ， 
125,000 <x< 300,000, 
x> 300,000. 


A graph of the function up 


to 125,000 appears 


in Figure 10.2. 


□ 


Table 10.9 Data for Exercise 10.4 


Payment range 

Number of payments 

0-25 

6 

25-50 

24 

50-75 

30 

75-100 

31 

100-150 

57 

150 - 250 

80 

250-500 

85 

500-1000 

54 

1000-2000 

15 

2000-4000 

10 

Over 4000 

0 


10.3.1 Exercises 

10.4 Construct the ogive and histogram for the data in Table 10.9. 

10.5 (*) The following 20 wind losses (in millions of dollars) were recorded 
in one year: 

111 1 122334 

6 6 8 10 13 14 15 18 22 25 

(a) Construct an ogive based on using class boundaries at 0.5, 2.5, 8.5. 
15.5, and 29.5. 

(b) Construct a histogram using the same boundaries as in part (a). 

10.6 The data in Table 10.10 axe from Herzog and Laverty [54]. A certain 
class of 15-year mortgages was followed from issue until December 31 ， 1993. 
The issues were split into those that were refinances of existing mortgages 
and those that were original issues. Each entry in the table provides the 
number of issues and the percentage of them that were still in effect after the 
indicated number of years. Draw as much of the two ogives (on the same 
graph) as is possible from the data. Does it appear from the ogives that the 
lifetime variable (time to mortgage termination) has a different distribution 
for refinanced versus original issues? 

10.7 (*) The data in Table 10.11 were collected (units are millions of dollars). 
Construct the histogram. 


10.8 (*) Forty losses have been observed. Sixteen are between 1 and | and 
those 16 losses total 20. Ten losses are between | and 2 with a total of 15. 
Ten more are between 2 and 4 with a total of 35. The remaining 4 losses 
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Estimation for 
modified data 


11.1 POINT ESTIMATION 

It is not unusual for data to be incomplete due to censoring or truncation. 
The formal definitions are as follows. 


Definition 11.1 An observation is truncated from below (also called left 
truncated) at d if when it is below d it is not recorded but when it is above d 
it is recorded at its observed value. 

An observation is truncated from above (also called right Ijruncated) at 
u if when it is above u it is not recorded but when it is below u it is recorded 
at its observed value. / 


An observation is censored from below (also called left censored) at d if, 






— 
















distinction will be made between the two cases. 

To perform the estimation, the raw data must be summarized in a useful 
manner. The most interesting values are the uncensored observations. Let 
2/1 < 2/2 < * * * < 3 /fc be the /c unique values of the xjs that appear in the sample ， 
where fc must be less than or equal to the number of uncensored observations. 
Let sj be the number of times the uncensored observation yj appears in the 








一 (number of XiS < yj) — (number of UiS < yj). 

^ ( 11 . 1 ) 
easier to conceptualize because it includes all who 
r to the given age less those who have already left, 
risk set is the number of people observed alive at 
amounts, the risk set is the number of policies with 
er the actual amount or the maximum amount due 

















)ne died prior to 2 / 1 , the survival function remains at 1 until this 


king conditionally, just before yi, there were ri people available 
^hich si did so. Thus, the probability of surviving past yi is 
This becomes the value of S(yi) and the survival function remains 
























loring, it is possible that at the age of the last death there 
re but all were censored prior to death. We know that 
I observed death age is possible, but there is no empirical 
plete the survival function. One option (the first one used 


Then, for t>w, 
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Sn(t) = e (" 咖， = where s* = fj . 

There is an alternative method of obtaining the values of Sj and rj that is 
more suitable for Excel® spreadsheet work. 1 The steps are as follows: 

1. There should be one row for each data point. The points need not be 
in any particular order. 

2. Each row should have three entries. The first entry should be d h the 
second entry should be Uj or xj (there can be only one of these). ’The 
third entry should be the letter V 1 (without the quotes) if the second 
entry is an x-value (an observed value) and should be the letter V if 
the second entry is a u-value (a censored value). Assume, for example, 
that the djs occupy cells B6:B45, the ujs and xjs occupy C6:C45, and 
the x/u letters occupy D6:D45. 

3. Create the columns for x, r, and s as follows. 

4. For the ordered observed values (say they should begin in cell F2 } start 

with the lowest d-value. The formula in F2 is =]VHN(B6:B45). ’ 

5. Then in cell F3 enter the formula 

=]VIM(IF(C$6:CS45>F2 5 IF(D$6:D$45=V ? } CS6:C$45 } 1E36) j 1E36)). 

Because this is an array formula, it must be entered with Ctrl-Shift- 
Enter. Copy this formula into cells F4, F5, and so on until the value 
1E36 appears. Column F should now contain the unique, ordered, y- 
values. 

6. In cell G3 enter the formula 

=COUNTIF(B$6 : B$45,“ 〈” &F3)-COUNTIF(C$6 : C$45，“ 〈” &F3). 

Copy this formula into cells G4, G5, and so on until values appear across 
from all but the last value in column F. This column contains the risk 
set values. 

7. In cell H3 enter the formula 

=SUIVI(IF(CS6:C$45=F3,IF(D$6:D$45=V , ,l,0) ! 0)). 

lTbis scll eme was devised by Charles Thayer and improved by Margie Rosenberg. It is a 
gre f improvement over the author’s scheme presented in earlier drafts. These instructions 
work with Office XP and should work in a similar manner for other versions. 










ESTIMATION FOR MODIFIED DATA 



Cnter this array formula with Ctrl-Shift-Enter and then copy it into H4, 
15， and so on to match the rows of colmrni G. This column contains 
he 5 -values. ■ 

Begin the calculation of S{y) by entering a 1 in cell 12 

Calculate the next S{y) value by entering the following formula in cell 
： 3. 

=I2*(G3 - H3)/G3. 



hazard rate function. Also, let s{t) be the expected total number of observed 
deaths prior to time t. It is reasonable to conclude that 


s(t) = / r(u)h(u)du. 


Taking derivatives, 


ds(t) = r(t)h(t)dt. 


Then, 


ds(t) 

r(t) 


h(t)dt. 


Integrating both sides yields 
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0333 = 0.9672, 
1103 = 0.8956, 
1487 = 0.8618, 
2256 = 0.7980, 


= 0.7285, 

0.7285 or 0 or 0.7285*/ 50 , 


JH_=_ 

Now replace the true expected count s(t) by s(t), the observed number of 
deaths by time t. It is a step function, increasing by Si at each death time. 
Therefore, the left-hand side becomes 


which defines the estimator, H(t). The Nelson-Aalen estimator is 

f 0, 二 

含⑺ = < Tf*? Vj—i S t < yj，j = 2, .. • 


For t>w, alternative estimates are S(t) = 0 and S(t) = S(yk)^ w ^ 

Example 11.4 Determine the Nelson-Aalen estimate of the survival function 
for Data Set D2. 


^ = 0.0333, 0.8 < t < 2.9, 

0.0333 + 嘉 = 0.1103 ， 2.9 < t < 3.1, 
0.1103 + 去 = 0.1487, 3.1 < t < 4.0, 
0.1487 + 蠢 = 0.2256, 4.0 < t < 4.1, 
0.2256 + ^ = 0.2691 ， 4.1 < t < 4.8, 
0.2691 + 吾 = 0.3167, t > 4.8. 


□ 


，.9,.1,.0,.1,.8,.0, 

OCT2.3.4.4.4.5. 

o.< < < < < < 

t <1 vi<vl <1 vi5. 
vi .8.9 .l.0.l .8> 
oo.2.3.4.4.4.i 
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Exercises 

peat Example 11.2, treating “suirender” as “death.” The easiest way 
• I . __xi™^ ™ n , -f-Vion iiqa + ： Vip fl.Vinvfi foriniilSi. In 




II 醒讎 ISpp 讎犠 ^ 




nice were observed at birth. An additional zu mice 
: 2 (days) and 30 more were first observed at age 4. 
ge 1, 10 at age 3, 10 at age 4, a at age 5, b at age 
ddition, 45 mice were lost to observation at age 7, 
ige 13. The following product-limit estimates were 
2 and 5 350 (13) = 0.856. Determine the values of a 


sored and no two lives died at the same age. At the time of the ninth death, 
the Nelson-Aalen estimate of the cumulative hazard rate is 0.511 and at the 
time of the tenth death it is 0.588. Estimate the value of the survival function 
at the time of the third death. 

11.9 (*) All members of a study joined at birth; however, some may leave 
the study by means other than death. At the time of the third death, there 
was one death (that is, S 3 = 1 ); at the time of the fourth death there were two 
deaths; and at the time of the fifth death there was one death. The following 
product-limit estimates were obtained: S n (ys) = 0.72, S n [y^j = 0.60, and 
S n {y^) = 0.50. Determine the number of censored observations between times 
t /4 and 2 / 5 . Assume no observations were censored at the death times. 


11.2 MEANS, VARIANCES, AND INTERVAL ESTIMATION 

When all of the information is available, working with the empirical estimate 
of the survival function is straightforward. 

Example 11.5 Demonstrate that for complete data the empirical estimator 
of the survival function is unbiased and consistent. 

Recall that the empirical estimate of S(x) is 5 n (a;) = Y/n, where Y is the 
number of observations in the sample that axe greater than x. Then Y must 
have a binomial distribution with parameters n and S(x). Then, 

E 剛] =E ⑸ = 宇=制， 

demonstrating that the estimator is unbiased. The variance is 

V 種 =0 生 

\n J n 


which has a limit of zero, thus verifying consistency. 


□ 
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In order to make use of the result, the best we can do for the variance 

is estimate it. It is unlikely we know the value of S(x) because that is the 

quantity we are trying to estimate. The estimated variance is given by • 

獅⑷ ]= 娜 广 M . 

The same results hold for empirically estimated probabilities. Let p = Pr(a < 
X < b). The empirical estimate of p is p = S n (a) — S n (b). Arguments similar 
to those used in the example above verify that p is unbiased and consistent, 
with Var(p) = p(l - p)/n. 

When doing mortality studies or evaluating the effect of deductibles, we 
sometimes axe more interested in estimating conditional quantities. 

Example 11.6 Using the fall information from the observations in Data Set 


For this data set, n = 30. By duration 2, one had died, and by duration 3, 
three had died. Thus, S 30 (2)= 需 and «?3o(3)= 器 . The empirical estimate 

18 . 5 30 (2)-5 3 o(3) 2 

q2 — ―— ~29* 
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Example 11.7 Using Data Set B, empirically estimate the probability that a 
payment will be at least 1,000 when there is a deductible of 250. 

Empirically, there were 13 losses above the deductible, of which 4 exceeded 
1,250 (the loss that produces a payment of 1,000). The empirical estimate 
is Using survival function notation, the estimator is 520(1250) / *520(250). 
Once again, only a conditional variance can be obtained. The estimated 
variance is 4(9)/13 3 . □ 


For grouped data, there is no problem if the survival function is to be 
estimated at a boundary. For interpolated values using the ogive, it is a bit 
more complex. 


Tr amp le 11.8 Determine the expected value and variance of the estimators 
of the survival function and density function using the ogive and histogram ， 
respectively. 


Suppose the value of rc is between the boundaries Cj-i and Cj. Let Y be 
the number of observations at or below Cj—i and let Z be the number of 
observations above Cj_i and at or below Cj. Then 


Sn(x) - 


n(cj - Cj_i) 


and 


ns n {x)} 


n[l - 


S(cj~i)](cj - Cj-i) + n[S(cj-i) - S(cj)](x . 

rz(Cj. — Cj-!) 


Cj-1 


5( Cj -x) 


Cj — X 


c j — 


+ 5 ( Ci ) 


. Cj-l 


c i - c i- 


This estimator is biased (although it is an unbiased estimator of the true 
interpolated value). The variance is 


Vax[5 n (x)]= 


(c 广 Cj ^) 2 Var(y) + (x- c^x ) 2 Vax(Z) 
+2(Cj. - Cj^i)(x - Cj^x) Cov(Y ； Z) 


[n(Cj • — q •一 i)】 2 


where Vax(y) = 715(^_1)[1 — S(cj^i)], Vbi(Z) = n[5(Cj. 一 i) - 5(cj)][l - 
♦一 i) + S(cj)], and Cov^, Z) = —n[l - 5(Cj •一 i)][5(Cj •一 i) — S(cj)]. For the 
density estimate, 











Discrete data such as in Data Set A can be considered a special form of 
grouped data. Each discrete possibility is similar to an interval. 

Example 11.10 Demonstrate that for a discrete random variable the empir¬ 
ical estimator of a particular outcome is unbiased and consistent and derive 
its variance. 


Let Nj be the number of times the value Xj was observed in the sam¬ 
ple. Then Nj has a binomial distribution with parameters n and p(xj). The 
empirical estimator is p n (^j) = Nj/n and 

Eb n (q)]= E ⑵ = 字 = P ㈣ ， 

demonstrating that it is linbiased. Also, 



npjx^ll - p( Xi )] 
n 2 


p{x j )[l-p{x j )} 
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which goes to zero as n —» oo, demonstrating consistency. □ 

Example 11.11 For Data Set A determine the empirical estimate of p(2) 
and estimate its variance. 


The empirical estimate is 


P94,935(2) = ^ =0.017043 


and its estimated variance is 


0.017043(0.982957) 


It is possible to use the variances to construct 
unknown probability. 


intervals for the 


Example 11.12 Use (9.3) and (9.4) to construct approximate 95% confi¬ 
dence intervals for p(2) using Data Set A. 

From (9.3), 


0.95 = Pr -1.96 < 


Pn(2)-p(2) 

〜 ( 2 )Il— P ⑶] /n 


Solve this by making the inequality an equality and then squaring both sides 
to obtain [dropping the argument of (2) for simplicity], 

= 1.96 2 , 

p(l-p) 

np^ — 2npp n + np 2 1.96 2 p — 1.96 2 p 2 , 

0 == (n + 1.96 2 )p 2 — (2np„ + 1.96 2 )p + np^. 


2np n + 1.96 2 ± y/(2np n + 1.96 2 ) 2 - 4(n + 1.96 2 )np2 
P= 2(n + 1.96 2 ) 1 

which provides the two endpoints of the confidence interval. Inserting the 
numbers from Data Set A (p n = 0.017043, n = 94,935) produces a confidence 
interval of (0.016239,0.017886). 

Equation (9.4) provides the confidence interval directly as 


Pn 士 1.96^ 








interval of (0.016220,0.017866). The answers for the two methods are very 
similar, which is to be expected when the sample size is large. The results 
are reasonable because it is well known that the normal distribution is a good 
approximation to the binomial. □ 


When data are censored or truncated, the matter becomes more complex. 
Counts no longer have the binomial distribution and therefore the distribution 
of the estimator is harder to obtain. While there are proofs available to back 
up the results presented here, they will not be provided. Instead, an attempt 
will be made to indicate why the results are reasonable. 

Consider the Kaplan-Meier product-limit estimator of S(t). It is the prod¬ 
uct of a number of terms of the form (rj — where Tj was viewed as the 

number available to die at age yj and Sj is the number who actually did so. 
Assume that the death ages and the number available to die are fixed, so that 
the value of Sj is the only random quantity. As a random variable, Sj has 
a binomial distribution based on a sample of rj lives and success probability 
[5(%—i) - S(yj)]/S(yj 一 The probability arises from the fact that those 
available to die were known to be alive at the previous death age. For one of 
these terms, 

- e ^--5A rj - r.-tSfa-，!)- Sjy^/Sjy^) S( yj ) 

V r 3 ) r 3 ■?(%_!)' 

That is, this ratio is an unbiased estimator of the probability of surviving 
from one death age to the next one. Furthermore, 

Sjyj^-Sjyj) r U — 咖 ) ] 

= 3 [ - gfa-x) J 

V ) r? 

= 

rjSiyj-i) 2 ' 

Now consider the estimated survival probability at one of the death ages. Its 


variance of a product of independent random variables. Let Ai,... ,X n 
independent random variables where E(Xj-) = fij and Vax(Xj) = aj. Thee 


Vax(Xi • 


■x n ) = E(Xl--X^)-E{X 1 ---X n ) 2 

=ecx^-.-eM-ecx^-'-e ^) 2 

=(Ml + cf) + 0 - 1 )- ■■- Hn- 


For the product-limit estimator, 


Vax^^)] = Vax|n^ 


\s{vi-xy- 


[Sjy^-Siymyj)] S( Vj ) 2 
nSiy^r 」 5(y 0 ) 2 


A | > 咖 ) 

i=l L 

n[ 


[ 5 ( 2 / ㈠卜科讲肿⑹― 


nSivi-i) 2 


S{yj) 2 


S{yj) 2 rjSjyi) + [gfa-i) - Siyi)}! — S(yj)~ 

J — 




following approximation is 
small numbers ai.a” 



approximation 




Because it is unlikely that the survival mnction is imown, an estimated value 
needs to be inserted. Recall that the estimated value of 5(%) is actually 
conditional on being alive at age yo. Also, (ri — Si)/ri is an estimate of 
一 ; l). Then, 

Vax[5 n (yj)] = S n {yj) 2 ^ r “ ri L 妁 ) • (11.3) 

Equation (11.3) is known as Greenwood’s approximation. It is the only version 
that will be used in this text. 
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Example 11.13 Using Data Set Dl, estimate the variance of 5 30 (3) both 
directly and using Greenwood’s formula. Do the same for 2 々 3 . 

Because there is no censoring or truncation, the empirical formula can be 
ysed to directly estimate this variance. There were three deaths out of 30 
individuals, and therefore 

獅 3 。 ⑶卜 娜 3 (0 寧 ) = 蒜 . 

For Greenwood’s approximation, r x = 30, s x = 1, r 2 = 29, and s 2 = 2. The 
approximation is 


enter and leave due to truncation and censoring). Erom Example 11.2, the 
relevant values within the first three years are = 30, r 2 = 26, Si = 1, and 
52 = 2. Prom Example 11.3, 540(3) = 0.8923. Then, Greenwood’s estimate is 


_ 23 ) 2 ( 30 M + 26 ^)) 


(I) (a0^9) 


2 \ = _ 81 ^ 
29(27) J = 303* 


It can be demonstrated that when there is no censoring or truncation the two 


An approximate 95% confidence interval can be constructed using the normal 
approximation. It is 

0.8923 士 1.96V0.0034671 = 0.8923 士 0.1154, 

which corresponds to the interval (0.7769,1.0077). For small sample sizes, it 
is possible that the confidence intervals admit values less than 0 or greater 
than 1. 

With regard to 2 泛 3, the relevant quantities are (starting at duration 3, but 
• £ - thp pfl.rlipr pira.mnlfis for r 




With regard to 2 &，arguing as in Example 11.6 j 
-(conditional) variance of 

M 7 )( 2 3/27)_ 92 


For Greenwood’s formula, we first must note that we are estimating 

5(3)-5(5) M 
~ q ■? ⑶ <5 ⑻. 

As with the empirical estimate, all calculations must be done given the 27 
people alive^at duration 3. Furthermore, the variance of 2 Q 3 is the same as the 
variance of 5(5) using only information from duration 3 and beyond. Starting 
from duration 3 there axe three death times, 3.1, 4.0, and 4.8, with ri = 27, 
7*2 = 26,7*3 = 25, 5i = 1 ， S2 = 1， and S3 = 2. Greenwood’s approximation is 

(27) (27(26) + 26(25) + 25(23)) = 27 s ' 

□ 

Example 11.14 Repeat the previous example, this time using all 40 obser¬ 
vations in Data Set D2 and the incomplete information due to censoring and 
truncation. 

For this example, the direct empirical approach is not available. That is 


^ 07215 V ( 1 

^0.8923 J V 26(25) 


26(24) + 23(22) 


The previous example indicated that the usual method of constmcting a 
confidence interval can lead to an unacceptable result. An alternative ap¬ 
proach can be constructed as follows. Let Y = ln[—ln5 n (t)]. Using the 
delta method (see Theorem 12.17)，the variance of Y can be approximated as 
follows. The function of interest is g(x) = ln(—lnx). Its derivative is 


According to the delta method, the variance of Y can be approximated by 


綱 5„(*))]} 2 Vax[5„(t)] 


Vaj[S„(i)] 

[S n (t)hiS n {t)Y 


where we have used the fact that S n (t) is an unbiased estimator of S(t). Then, 
an estimated 95% confidence interval for 6 = ln[—In 5(t)] is 


ln[-ln5 n ⑺]土 1.96 


\Jvsi[Sn(t)] 

s n (t)h>.s n (ty 


Because S(t) = exp (— e 0 )，putting each endpoint through this formula will 
provide a confidence interval for S(t). For the upper limit we have (where 
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exp |- e ln [~ ln !n 5 n (t)]| 

=exp {[In ⑴ ] e 1 . 96 而队 ⑴ ln 〜⑴】 } 

= S n (t) u , U = 

Similarly, the lower limit is Sn^t) 1 ^. This interval will always be inside the 
range zero to 1 and is referred to as the log-transformed confidence interval. 

Example 11.15 Obtain the log-transformed confidence interval for 5(3) as 
in Example 11.14. 

We have 

U = exp 

The lower limit of the interval is 0.8923 1 / 0 * 32142 = 0.70150 and the upper limit 
is 0.8923 0 * 32142 = 0.96404. □ 


'1.96VO003467I1 
0.8923 ln(0.8923)J 一 u . 以丄仏 


- 1 . 96 ^ 
6XP ⑷ ln5„(f) 
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Example 11.16 Construct an approximate 95% confidence interval for H (3) 
by each formula using all 40 observations in Data Set D2. 

The point estimate is H(3 )= 忐 + 昜 = 0.11026. The estimated variance 
is gp' + 2 P* = 0.0040697. The linear confidence interval is 

0.11026 士 1.96\/0.0040697 = 0.11026 土 0.12504 

for an interval of (—0.01478,0.23530). For the log-transformed interval, 

C ； = exp j ^ 6 %^ 4 ^^ 97 ) 1 / 2 j = exp ( 士 1.13402) = 0.32174 to 3.10813. 
The interval is 0.11026(0.32174) = 0.03548 to 0.11026(3.10813) = 0.34270. □ 

11.2.1 Exercises 

11.10 Using the full information from Data Set Dl, empirically estimate qj 
for j = 0,…， 4 and spo where the variable of interest is time of surrender. 
Estimate the variance of each of your estimators. Identify which estimated 
variances axe conditional. Interpret 5 如 as the probability of surrendering 
before the five years expire. 


having two or more accidents and estimate its vaxiance. 

11.12 Repeat Example 11.13 using time to surrender as the variable. 

11.13 Repeat Example 11.14 using time to surrender as the variable. Inter¬ 
pret 2 Q 3 as the probability of surrendering before the five years expire. 

11.14 Obtain the log-transformed confidence interval for 5(3) in Exercise 
11.13. 

11.15 Construct 95% confidence intervals for H(3) by each formula using all 
40 observations in Data Set D2 with surrender being the variable of interest. 

11.16 (*) Ten individuals were observed from birth. AH were observed until 
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Table 11.4 Data for Exercise 11.16 


Age 

Number of c 

eaths 

2 


1 

3 


1 

5 


1 

7 


2 

10 


1 

12 


2 

13 


1 

14 


1 


five years they are 10 and 3. Determine Greenwood’s approximation to the 
variance of 5(4). 

11.18 (*) Observations can be censored, but there is no truncation. Let yj 
and 2 /j+i be consecutive death ages. A 95% linear confidence interval for 
H(yj) using the Nelson-Aalen estimator is (0.07125,0.22875) while a similar 
interval for H(y j+ i) is (0.15607,0.38635). Determine Sj +1 . 

11.19 (*) A mortality study is conducted on 50 lives, all observed from age 
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Table 11.6 Data for Exercise 11.24 


h 

s i 

r 3 

10 

1 

20 

34 

1 

19 

47 

1 

18 

75 

1 

17 

156 

1 

16 

171 

1 

15 


11.23 Construct a kernel density estimate for the time to surrender for Data 
Set D2. Be aware of the fact that this is a mixed distribution (probability is 
continuous from 0 to 5 but is discrete at 5). 


11.24 (*) You are given the data in Table 11.6 on time to death. Using the 
uniform kernel with a bandwidth of 60, determine /(100). 





set allows these values to be accumulated and from there only this reduced 
set of values needs to be processed. 


11.4.2 Kaplan-Meier type approximations 


In order to apply the Kaplan-Meier formula, assumptions must be made about 



the distribution function at the given endpoints and then smooth the fanction 
by interpolation (usually linear) between successive values. Then, 


S( Cj ) - S(c j+1 ) 
S(cj) 


l -~) - n : = 。 
n i_1 ( I 』、 


T 3) ^3 


This is the traditional form of a life table estimator where the numerator has 
the number of observed deaths and the denominator is a measure of exposure 
(the number of lives available to die). For this formula, all who enter the 


MM 
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done so and that 100^% of those who will be censored (be counted in Uj) have 
done so. Then the risk set is rj = Pj+adj -puj. Equation (11.5) is obtained 
by setting a = 1 and /? = 0. An alternative is to set o ； = /? = 0.5. This is 
equivalent to having the entrants and exits spread uniformly throughout the 
interval. The result is q = Pj+O^dj-Uj), Because P i+1 = Pj+dj-u 广 x h 
this can be rewritten as 

= 0 ,5( 巧 + 巧 +i + 々)， (11.6) 

a formula commonly used in constructing mortality tables. At times it maybe 
necessary to make different assumptions for different intervals, as illustrated 
in Example 11.20. 


11-4.3 Multiple-decrement tables 

Tlie goal of all the estimation procedures in this text is to deduce the prob- 
ability distribution for the variable of interest in the absence of truncation 
and censoring. For loss data, that would be the probabilities if there were no 
deductible or limit. For lifetime data it would be the probabilities if we could 
follow people from birth to death. In the language of Actuarial Mathematics 
[16]，these are called single-decrement rates and are denoted gj. It is often 
desired to create a table of multiple-decrement probabilities, denoted qj. A 
superscript identifies the decrement of interest. For examp' 

Ah (d), withdrawal ㈣， and retirement (r 
a person age cj withdraws prior to age c j+ 
ment where death and retirement are not possible while q^ r) is the probability 
that a person age Cj retires prior to age in an environment where prior 
death or withdrawal eliminates the chance of retirement. Multiple-decrement 
tables are often constructed by obtaining the single-decrement rates from dif¬ 
ferent sources and then combining with a formula such as 




f 練尸， g f) 


n^-^) - 

5=1 


where 5 represents an arbitrary decrement and m is the total number of 
decrements. 


Example 11.20 Estimate both single- and multiple-decrement quantities us- 
Data Set D2 and the methods of this section. Make reasonable assump¬ 
tions. 


First consider the decrement death. In the notation of this section, the 
relevant quantities axe in Table 11.7. For this setting, more than one assump- 
tion is needed. For d 0 = 32 it is clear that the 30 values that are exactly 

ZG1TO sllfMll/*! ViO 'f , T , £SQ"l*/a/^ OC* f J C. _ a 1- — 1— • \ i .1 
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Table 11.7 Siagl&-decrement calculations for Example 11.20 
u i 叼 Pj rj < d ) 


Table 11.8 Multiple-decrement calculations for Example 11.20 



* he 2 poIicies t] ^ at entered after issue require an assumption. It makes sense 
to assume a uniform spread. Thenr 0 = 30 + 0.5(2) - 0.5 ⑶ = 29.5. The other 
r values axe calaJated using (11.6). Also note that the 17 policies that were 
=11 active after five years axe aU assumed to be censored at time 5, rather 
than be spread umformly through the fifth year. 

For withdrawals the values of q} w) are given in Table 11.8 along with the 
calculated multiple decrement values using (11.7). 口 

Example 11.21 Loss data for policies with deductibles of 0, 250, and 500 
and P°^yJirmts of 5,000, 7,500, and 10,000 were collected. The data are 

ITL _/ n.hl.P. II Q TTcO "fit a rm ^ J."L - . . _ 


•10. Because the deductibles and limits 
oly reasonable assumption is a = 1 and 
□ 



re issued, it is customary to assign a 
;hen the premium associated with that 
changes from “What is the probability 







Table 11.12 Data for Exercise 11.27 


Deductible 

Payment 

Deductible 

Payment 

250 

2,221 

500 

3,660 

250 

2,500* 

500 

215 

250 

207 

500 

1,302 

250 

3,735 

500 

10 , 000 * 

250 

5 , 000 * 

1,000 

1,643 

250 

517 

1,000 

3,395 

250 

5,743 

1,000 

3,981 

500 

2,500* 

1,000 

3,836 

500 

525 

1,000 

5 , 000 * 

500 

4,393 

1,000 

1,850 

500 

5 , 000 * 

1,000 

6,722 


* Amount paid was at the policy limit 


0.5 a reasonable assumption . 6 For the data in Table 11.11 estimate ^45 and 分必 
using both the exact Kaplan-Meier estimate and the method of this section. 

11.27 Twenty-two insurance payments are recorded in Table 11.12. Use the 
fewest reasonable number of intervals and the method of this section to esti¬ 
mate the probability that a policy with a deductible of 500 will have a payment 
in excess of 5,000. 


6 However, just as in Example 11.20 where different d values received different treatment, 
here there can be u values that deserve different treatment. Observation of an individual 
can end because the individual leaves or because the observation period ends. For studies 
of insured lives it is common for observation to end at a policy anniversary and thus at a 
whole-number insuring age. For them, 5 = 0 is an appropriate assumption. 
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Table 11.11 Data for Exercise 11.26 


Table 11.10 Calculations for Example 11.21 


C 3 

d 3 

U 3 

X 3 

Pj 

r 5 

4 d 、 

Hcj) 

0 

379 

0 

15 

0 

379 

0.0396 

0.0000 

100 

0 

0 

16 

364 

364 

0.0440 

0.0396 

250 

903 

0 

130 

348 

1,251 

0.1039 

0.0818 

500 

1,176 

0 

499 

1,121 

2,297 

0.2172 

0.1772 

1,000 

0 

0 

948 

1,798 

1,798 

0.5273 

0.3560 

2,500 

0 

42 

607 

850 

850 

0.7141 

0.6955 

5,000 

0 

30 

148 

201 

201 

0.7363 

0.9130 

7,500 

10,000 

0 

7 

16 

23 

23 

0.6957 

0.9770 

0.9930 


someone having their xth. birthday dies in the next year?” to “What is the 
probability that a person who was assigned age x — ifc at issue k years ago 
dies in the next year?” Such age assignments are called “insuring ages” and 
effectively move the birthday to the policy issue date. As a result, insured 
lives tend to enter observation on their birthday (artificially assigned as the 
policy issue date). This makes the d value exact (with a = 1) rather than 
approximate. However, withdrawal can take place at any time, making = 


2 4 
6 . 6 . 
4 4 


45.47.47.46. 


45464646464646 


3 7 4 
45.46.45. 


00 

4G.46. 


0 4 
7.5. 
4 4 


5 5 5 5 5 5 5 
4 4 4 4 4 4 4 


15163099480748164230 
14 9 6 1 


18 18 18 5 
5 7 18 111 
2 4 3 


65938470 
9 7 3 14 11 


0000 

G Goo. 50 ， ,00,500,00 o00 
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Part IV 


Parametric statistical 
methods 
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Parameter estimation 


If a phenomenon is to be modeled using a parametric model, it is necessary to 
assign values to the parameters. This could be done arbitrarily，but it would 
seem to be more reasonable to base the assignment on observations from that 
phenomenon. In particular, we will assume that n independent observations 
have been collected. For some of the techniques it will be further assumed 
that all the observations are from the same random variable. For others, that 
restriction will be relaxed. 

The methods introduced in Section 12.1 are relatively easy to implement 
but tend to give poor results. Section 12.2 covers maximum likelihood es¬ 
timation. This method is more difficult to use but has superior statistical 
properties and is considerably more flexible. 

12.1 METHOD OF MOMENTS AND PERCENTILE MATCHING 

For these methods we assume that all n observations axe from the same para¬ 
metric distribution. In particular, let the distribution function be given by 

F{x\e), 0 T = (e 1 ,e 2 ,...,e p ) 
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where 0 T is the transpose of 8. That is, 0 is a column vector containing the 
P parameters to be estimated. Furthermore, let ^,(0) = E(X k \0) be the kth. 
raw moment and let TT g (d) be the lOO^th percentile of the random variable. 
That is, F[tv 9 {Q)\6] — g. the distribution function is continuous, there will 
be at least one solution to that equation. 

For a sample of n independent observations from this random variable, let 
Aa ； = \ x j be the empirical estimate of the kth moment and let Tt g be 
the empirical estimate of the lOO^th percentile 

Definition 12.1 A method-of-moments estimate of 6 is any solution of 
the p equations 

= Afc, k = l,2,...,p. 

The motivation for this estimator is that it produces a model that has the 
same first p raw moments as the data (as represented by the empirical dis¬ 
tribution). The traditional definition of the method of moments uses positive 
integers for the moments. Arbitrary negative or fractional moments could also 
be used. In particular, when estimating parameters for inverse distributions, 
matching negative moments may be a superior approach. 1 

Example 12.2 Use the method of moments to estimate parameters for the 
exponential, gamma } and Pareto distributions for Data Set B from Chapter 
10 . ^ ^ 

The first two sample moments are 

Ai = i (27 + ••■ + 15,743) = 1,424.4, 

M 2 = ^(27 2 + • • - + 15,743 2 ) = 13,238,441.9. 

For the exponential distribution the equation is 
6 = 1,424.4 

with the obvious solution, 9 = 1,424.4. 

For the gamma distribution, the two equations axe 

E(X) = aO = 1,424.4, 

E(X 2 ) = a(a + 1)0 2 = 13,238,441.9. 

Dividing the second equation by the square of the first equation yields 

= 6.52489, 1 = 5.52489a 

a 

1 One advantage is that with appropriate moments selected the equations must have a 
solution within the range of allowable parameter values. 



There is no guarantee that the equations will have a solution or, if there is 
a solution, that it will be unique. 

Definition 12.3 A percentile matching estimate of 9 is any solution of 
the p equations 

fc = 

where gi, g 2 , …, 9p are p arbitrarily chosen percentiles. From the definition 
of percentile, the equations can also be written 

F{^g k W) =9k, k = l,2,...,p. 

The motivation for this estimator is that it produces a model with p per¬ 
centiles that match the data (as represented by the empirical distribution). 
.As with the method of moments，there is no guarantee that the equations will 
have a> solution or, if there is a solution, that it will be unique. One problem 
with this definition is that percentiles for discrete random variables (such as 
the empirical distribution) axe not always well defined. For example, Data 
Set B has 20 observations. Any number between 384 and 457 has 10 observa¬ 
tions below and 10 above and so could serve as the median. The convention 
is to use the midpoint. However, for other percentiles，there is no “official” 
interpolation scheme. 2 The following definition will be used here. 

Definition 12.4 The smoothed empirical estimate of a percentile is found 

by 

itg == (1 — /i)x(j) 4 - where 

j == L( n + 1)^J and ft = (n + 1)^ — j. 

2 Hyndman and Fan [65] present nine different methods. They recommend a slight modifi¬ 
cation of the one presented here using j = [g(n +*) + *」and = g(n + — i- 
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Here [*J indicates the greatest integer function and X(i) < x( 2 ) < • - < X( n ) 
are the order statistics from the sample. 


Unless there are two or more data points with the same value, no two 
percentiles will have the same value. One feature of this definition is that tt 9 
cannot be obtained for g < l/(n+l) or g> n/ (n+1). This seems reasonable as 
we should not expect to be able to infer the value of large or small percentiles 
from small samples. We will use the smoothed version whenever an empirical 
percentile estimate is called for. 


Example 12.5 Use percentile matching to estimate parameters for the expo¬ 
nential and Pareto distributions for Data Set B. 

For the exponential distribution, select the 50th percentile. The empirical 
estimate is the traditional median of 赍 o .5 = (384 4 - 457)/2 = 420.5 and the 
equation to solve is 

0.5 = F(42O.5|0) = 1 - e- 42O 5/0 ， 

，一 -420.5 


For the Pareto distribution, select the 30th and 80th percentiles. The 
smoothed empirical estimates are found as follows: 

30th: j = [21(0.3)J =： [6.3J = 6, ft = 6.3 - 6 = 0.3 ， 

7T 0 .3 = 0.7(161) + 0.3(243) = 185.6 ， 

80th: j = [21(0.8)J = L16.8J = 16, A = 16.8 - 16 = 0.8, 

#o.8 = 0.2(1,193) + 0.8(1,340) = 1,310.6. 


The equations to solve are 


0.3 

= F(185.6) = 1-( 

6 \ a 

185.6 +e) 5 

0.8 

=JP(1,310.6) = 1 — 

(e y 

Vi3io.6 + 0y 
( 0 \ 

In 0.7 

= —0.356675 = a In 

V 185.6 + 0 J ， 

In 0.2 

= —1.609438 = a In 

( 6 \ 

V 1 , 310.6 + 0y 
0 、 ’ 

—1.609438 

一 0.356675 

= 4.512338- y 310 ' 6 ^. 

in I f I-**- 1% ■ n 1 
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Any of the methods from Appendix F can be used to solve this equation for 
0 == 715.03. Then, from the first equation, 


0.3 = 1 - 


715.03 


185.6 + 715.03, 


which yields o ； = 1.54559. 


□ 


The estimates axe much different from those obtained in Example 12.2. 
This is one indication that these methods may not be particularly reliable. 


12.1.1 Exercises 

12.1 Determine the method-of-moments estimate for an exponential model 
for Data Set B with, observations censored at 250. 

12.2 Determine the method-of-moments estimate for a lognormal model for 
Data Set B. 

12.3 (*) The 20th and 80th percentiles from a sample axe 5 and 12 respec¬ 
tively. Using the percentile matching method, estimate 5(8) assuming the 
population has a Weibull distribution. 

12.4 (*) From a sample you are given that the mean is 35,000, the standard 
deviation is 75,000, the median is 10,000, and the 90th percentile is 100,000. 
Using the percentile matching method, estimate the parameters of a Weibull 
distribution. 

12.5 (*) A sample of size 5 produced the values 4, 5, 21, 99, and 421. You 
fit a Pareto distribution using the method of moments. Determine the 95th 
percentile of the fitted distribution. 

12.6 (*) In year 1 there are 100 claims with an average size of 10,000 and in 
year 2 there axe 200 claims with an average size of 12,500. Inflation increases 
the size of all claims by 10% per year. A Pareto distribution with a = 3 and 
6 unknown is used to model the claim size distribution. Estimate 9 for year 
3 using the method of moments. 

12.7 (*) From a random sample the 20th percentile is 18.25 and the 80th 
percentile is 35.8. Estimate the parameters of a lognormal distribution using 
percentile matching and then use these estimates to estimate the probability 
of observing a value in excess of 30. 

12.8 (*) A claim process is a mixture of two random variables A and J5, where 
A has an exponential distribution with a mean of 1 and B has an exponential 
distribution with a mean of 10. A weight of pis assigned to distribution A and 
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1 —p to distribution B. The standard deviation of the mixture is 2. Estimate 
V by the method of moments. 

12.9 (*) A random sample of 20 observations has been ordered as follows: 

12 16 20 23 26 28 30 32 33 35 

36 38 39 40 41 43 45 47 50 57 

Determine the 60th sample percentile using the smoothed empirical estimate. 

12.10 (*) The following 20 wind losses (in millions of dollars) were recorded 
in one year: 

1111122334 
6 6 8 10 13 14 15 18 22 25 

Determine the sample 75th percentile using the smoothed empirical esti¬ 
mate. 

12.11 (*) The observations 1,000, 850, 750, 1,100, 1,250, and‘900 were ob¬ 
tained as a random sample from a gamma distribution with unknown para¬ 
meters a and 6. Estimate these parameters by the method-of-moments. 

12.12 (*) A random sample of claims has been drawn from a loglogistic dis¬ 
tribution. In the sample, 80% of the claims exceed 100 and 20% exceed 400. 
Estimate the loglogistic parameters by percentile matching. 

12.13 (*) Let xi ： …， be a random sample from a population with cdf 

F(x) = 0 < a: < 1. Determine the method of moments estimate of p. 

12.14 (*) A random sample of 10 claims obtained from a gamma distribution 
is given below: 

1,500 6,000 3,500 3,800 1,800 5,500 4,800 4,200 3,900 3,000 
Estimate a and 6 by the method of moments. 

12.15 (*) A random sample of five claims from a lognormal distribution is 
given below: 

500 1,000 1,500 2,500 4,500. 

Estimate [i and a by the method of moments. Estimate the probability 
that a loss will exceed 4,500. 

12.16 (*) The random variable X has pdf f(x) = /3~ 2 xexp(-0.bx 2 /p 2 ), x, 
/? > 0. For this random variable, E(X) = (P/2)^/2tt and Var(X) = 2p 2 — 
7t/3 2 /2. You are given.the following five observations: 


4.9 1.8 3.4 6.9 4.0 
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Table 12.1 Data for Exercise 12.18 


No. of claims 

No. of policies 

0 

9,048 

1 

905 

2 

45 

3 

2 

4+ 

0 

Table 12.2 

Data for Exercise 12.19 

No. of claims 

No. of policies 

0 

861 

1 

121 

2 

13 

3 

3 

4 

1 

5 

0 

6 

1 

7 + 

0 


Determine the method-of-moments estimate of /3. 

12.17 The random variable X has pdf f(x) = a\ a (X + 怎 )- a-1 ， x ， a，A > 0. 
It is known that A = 1,000. You are given the following five observations: 

43 145 233 396 775 

Determine the method of moments estimate of a. 

12.18 Use the data in Table 12.1 to determine the method-of-moments esti¬ 
mate of the parameters of the negative binomial model. 

12.19 Use the data in Table 12.2 to determine the method-of-moments esti¬ 
mate of the parameters of the negative binomial model. 

12.2 MAXIMUM UKELiHOOD ESTIMATION 
12.2.1 Introduction 

Estimation by the method of moments and percentile matching is often easy 
to do, but these estimators tend to perform poorly. The main reason for this 
is that they use a few features of the data, rather than the entire set of obser¬ 
vations. It is particularly important to use as much information as possible 



parameters for the normal distribution, the sample mean and variance axe 
sufficient. 3 However, when estimating parameters for a Pareto distribution, 
it is important to know all the extreme observations in order to successfully 
estimate a. Another drawback of these methods is that they require that all 
the observations axe from the same random variable. Otherwise, it is not clear 
wliat to use for the population moments or percentiles. For example, if half 
the observations have a deductible of 50 and half have a deductible of 100, 
it is not clear to what the sample mean should be equated. 4 Finally, these 
methods allow the analyst to maJke arbitrary decisions regarding the moments 
or percentiles to use. 

There axe a variety of estimators that use the individual data points. All 
of them are implemented by setting an objective function and then determin¬ 
ing the parameter values that optimize that function. For example, we could 
estimate parameters by minimizing the maximum difference between the dis¬ 

tribution function for the parametric model and the distribution function for 
the Nelson—Aalen estimate. Of the many possibilities, the onlv one usfid hpr^. 



meter values. It is possible that as various parameters become zero or infinite 
the likelihood function will continue to increase. Care must be taken when 
maximizing this function because there may be local maxima in addition to 
the global maximum. Often, it is not possible to analytically maximize the 
likelihood function (by setting partial derivatives equal to zero). Numerical 
approaches, such as those outlined in Appendix F, will usually be needed. 

Because the observations axe assumed to be independent, the product in 
the definition represents the joint probability Pr(Xi € .. ^X n G A n |0), 

that is, the likelihood function is the probability of obtaining the sample 
results that were obtained，given a particular parameter value. The estimate 
is then the parameter value that produces the model under which the actual 
observations axe most likely to be observed. One of the major attractions of 
this estimator is that it is almost always available. That is, if you can write an 
expression for the desired probabilities, you can execute this method. If you 

cannot write and evaluate an expression for probabilities using your model, 

there is no point in postulating that model in the first place because you will 
not be able to use it to solve your Droblem. 



depend on the same parameter vector, 0. In addition, the random variables 
are assumed to be independent. 

Definition 12.6 The likelihood function is 

L(0) = ；^P r (; eJ 4#) 

i=i 

and the maximum likelihood estimate of 0 is the vector that maximizes 
the likelihood function. 5 


3 This applies both in the formal statistical definition of sufficiency (not covered here) and 
in the conventional sense. If the population has a normal distribution, the sample mean 
and variance convey as much information as the original observations. 

4 One way to rectify that drawback is to first determine a data-dependent model such as 

the Kaplan-Meier estimate. Then use percentiles or moments from that model. 


/(27)/(82) … /(243)= 轉 eS- ， … 0— 243 / fl = r 7 e— 909 / 0 . 

For the final 13 terms, the set Aj is the interval from 250 to infinity and 
therefore Pi(Xj € Aj) = Pv{Xj > 250) = e -250 / 0 . There are 13 such factors 
making the likelihood function 

L{0) = 0- 7 e__(e - 2 _) 13 = 0 一 7 e- 4 ，_. 

nize the logarithm of the likelihood function. Because it 
denote the loglikelihood function as 1(6) = hiL(d). 


It is easier to me 
occurs so often, 
Then 


1(6) = -71110-4,1590 _1 ， 
l'{6) = _70 一 1 + 4,1590 -2 = 0, 

6 = = 594.14. 

In this case, the calculus technique of setting the first derivative equal to zero 












，， individual data 


runcation, and no censoring and the value of each observa- 


扣 caojr wxiuc biic jiugiiKemiooa runction. 

增 =n/%(^|0), m=f^hifx.ixjie). 

i=l j=l 

The notation indicates that it is not necessary for each observation to come 
from the same distribution. 

Example 12.8 Using Data Set B determine the maximum likelihood esti¬ 
mates for an exponential distribution，for a gamma distribution where a is 
known to equal 2, and for a gamma distribution where both parameters are 
unknown. 

For the exponential distribution, the general solution is 

K^) = ^ (—]n6 — Xjd~ x ) = —nlnd — nxO^ 1 . 


For Data Set B, 9 = x = 1,424.4. The value of the loglikelihood fiinction is 
—165.23. For this situation the method-of-moments and maximum likelihood 
estimates are identical. 

For the gamma distribution with a = 2, 


、… r ⑶ 0 2 —山… ， 

hi f(x\0) = hix-2\ae 

n 

K s ) = -2nln0-r7x0 _1 , 

i=i 

l'{9) = -2n6- x + nx9~ 2 = 0, 

0 = 

For Data Set B, 6 = 1,424.4/2 = 712.2 and the value of the loglikelihood func- 
tion is -179.98. Again, this estimate is the same as the method of moments 
estimate. 

For the gamma distribution with unknown parameters the equation is not 
as simple. 


w r(o0 俨 ’ 

ln/(a;|a,6>) = (a - 1) In a: - - lnr(a) - aln0. 



of the loglikelihood function is —162.29. These do not match the method-of- 
moments estimates. □ 


12.2.3 Complete, grouped data 

When data axe complete and grouped, the observations may be summarized 
as follows. Begin with a set of numbers Co < ci < • • • < ca ；, where cq is the 
smallest possible observation (often zero) and is the largest possible obser¬ 
vation (often infinity). From the sample, let rij be the number of observations 
in the interval For such data, the likelihood function is 

k 

m = H[F(c j \0)~F(c j - 1 \e))^ 

and its logarithm is 

m = J2n j HF(c j \e)-F(c j _ 1 \e)}. 
i=i 

Example 12.9 From Data Set C, determine the maximum likelihood esti¬ 
mate for an exponential distribution. 

The loglikelihood function is 

1(6) = 99 111[^(7,500) - F(0)] + 42 ln[F(17,500) - F(7,500)] + • • • 
+31n[l —_F(300,000)] 

= 99 In(l - e- 7 , 500 ， 0 ) + 42 ln(e 一 7 , 500 / 0 - e - 17 ’ 500 / 0 ) + … 
+ 31n e - 300 , 000 /' 

A numerical routine is needed to produce 0 = 29,721 and the value of the 
loglikelihood function is 一 406.03. □ 

12.2.4 Truncated or censored data 

When data are censored, there is no additional complication. As noted in 
Example 12.7, right censoring simply creates an interval running from the 
censoring point to infinity. In that example, data below the censoring point 
were individual data, and so the likelihood function contains both density and 
distribution function terms. 

Truncated data present more of a challenge. There are two ways to pro¬ 
ceed. One is to shift the data by subtracting the truncation point from each 




observation. The other is to accept the fact that there is no information 
about values below the truncation point but then attempt to fit a model for 
the original population. . 


Example 12.10 Assume the values in Data Set B had been truncated from 
below at 200. Using both TncthodSj estimate the value of a for a Pareto dis¬ 
tribution with 0 = 800 known. Then use the model to estimate the cost per 
payment with deductibles of 0, 200, and 400. 


Using the shifting approach, the values become 43, 94, 140, 184, 257, 480, 
655, 677, 774, 993, 1,140, 1,684, 2,358, and 15,543. The likelihood function is 


L(a)= 
Z(a) •= 

， (a)= 
a = 


TT 

^ (800 + a^+i ， 

14 

E[ln a + a ， In800 - (a + 1) ]n(xj + 800)] 
j=i 

14 In a + 93.5846a — 103.969(a +1) 

14 In a — 103.969 - 10.384a, 

14a— 1 —10.384 ， 


14 

107384 


1.3482. 


Because the data have been shifted, it is not possible to estimate the cost with 
no deductible. With a deductible of 200, the expected cost is the expected 
value of the estimated Pareto distribution, 800/0.3482 = 2,298. Raising the 
deductible to 400 is equivalent to imposing a deductible of 200 on the modeled 
distribution. Prom Theorem 5.13, the expected cost per payment is 



For the unshifted approach we need to ask the key question required when 

■jnS'hmr'finOP ■ft'hts - 4.1,__l___ 
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tion is (where the Xj values axe now the original values) 

n n[ 

ii 1 — F(200|a) L (800 + Xj) a + l / V800 + 200 〉 

TT qCi.ooo 0 ) 

y j 800 + ^+ 1, 

14 

141im + 14a In 1,000 - (a + 1) ^ ln(800 + xj), 

J=1 

=14 In a + 96.709a - (a +1)105.810, 

V{a) = 14a- 1 - 9.101, 
a = 1.5383. 

This model is for losses with no deductible, and therefore the expected cost 
without a deductible is 800/0.5383 = 1,486. Imposing deductibles of 200 and 
400 produces the following results: 


E(X) - E(XA 200) _ 

= 1,858, 

1 - F(200) 

0.5383 ， 5 

E(X)-E(XA400) 

= 2 229 

1 - F(400) 

0.5383 5 ' 


L(a)= 


KaO = 


It should now be clear that the contribution to the likelihood function can 
be written for most any observation. The following two steps summarize the 
process: 

1. For the numerator, use f(x) if the exact value, rc, of the observation is 
known. If it is only known that the observation is between y and z, use 
F{z)-F{y). 

2. For the denominator, let d be the truncation point (use zero if there is 
no truncation). The denominator is then 1 — F(d). 

Example 12.11 Determine Pareto and gamma models for the time to death 
for Data Set D2. 

Table 12.3 shows how the likelihood fiinction is constructed for these values. 
For deaths, the time is known and so the exact value of x is available. For 
surrenders or those reaching time 5, the observation is censored and therefore 
death is known to be some time in the interval from the surrender time, y, to 

-y — 4o Ti/vf* Komncp nil nHRPtnm.tinnPtirl 






likelihood estimate for a Poisson distribution and for a binomial distribution 
with m = 8. 


In general, for a discrete distribution with complete data, the likelihood 
function is 


m = H\p( Xj \e)Y 


where Xj is one of the observed values, p(xj\d) is the probability of observing 
Xji and n x is the number of times x was observed in the sample. For the 


Poisson distribution 



12.2.5 Exercises 

12.20 Repeat Example 12.8 using the inverse exponential, inverse gamma 
with a = 2, and inverse gamma distributions. Compare your estimates with 
the method-of-moments estimates. 

12.21 Erom Data Set C, determine the maximum likelihood estimates for 
5amma, inverse exponential, and inverse gamma distributions. 

12.22 Determine maximum likelihood estimates for Data Set B using the 
inverse exponential, gamma, and inverse gamma distributions. Assume the 










observed au, 

at age x + 0.5. Determine the maximum likelihood estimate of q x . 
12.29 (*) Ten lives are subject to the survival function 


where Hs time since birth. There are 10 lives observed from birth. At time 10, 
2 of the lives die and the other 8 are withdrawn from observation. Determine 
the maximum likelihood estimate of k. 



12.35 (*) You axe given the five observations 521, 658, 702, 819, and 1,217. 
Your model is the single-parameter Pareto distribution with distribution func¬ 
tion 


释卜(導 


Determine the maximum likelihood estimate of a. 


12.36 (*) You have observed the following five claim severities: 11.0, 15.2, 
18.0, 21.0, and 25.8. Determine the maxiirxmn likelihood estimate of fi for the 



















following model: 


/(a:) = ▲ exp [-- 卜 M)2 ] ，… 0 • 

12.37 (*) A random sample of size 5 is taken from a Weibull distribution 
with r = 2. Two of the sample observations are known to exceed 50 and the 
three remaining observations are 20 ， 30, and 45. Determine the maximum 
likelihood estimate of 8. 

12.38 (*) Phil and Sylvia axe competitors in the light bulb business. Sylvia 
advertises that her light bulbs burn twice as long as Phil’s. You were able 
to test 20 of Phil’s bulbs and 10 of Sylvia’s. You assumed that both of their 
bulbs have an exponential distribution with time measured in hours. You have 
separately estimated the parameters as 9p = 1,000 and 0s = 1，500 for Phil 


claim, that Qs = 2Qp • 

12.39 (*) A sample of 100 losses revealed 
38 were above 1,000. An exponential distribu 



of 0. Now suppose you axe also given that the 62 losses that were below 1,000 
totalled 28,140 while the total for the 38 above 1,000 remains unknowii. Using 
this additional information, determine the maximmn likelihood estimate of 0. 



(*) The following values were calculated from a random sample of 10 
Ej=i X J 2 = 0.00033674, Ei=i ^J 1 = 0.023999, 


E)ii 吟。. 5 = 0.34445, Ei=i4 5 = 488.97 


Z)ii^ = 31,939, E5=i^ = 211,498,983. 

Losses come from a Weibull distribution with t = 0.5 [so F(x) = l_e_(®/ 0 ) 0 . 5 ]. 
Determine the maximum likelihood estimate of 9. 


12.41 (*) For claims reported in 1997, the number settled in 1997 (year 0) 
was unlcnown，the number settled in 1998 (year 1) was 3, and the number 
settled in 1999 (year 2) was 1. The number settled after 1999 is unknown. 
For claims reported in 1998 there were 5 settled in year 0, 2 settled in year 1, 
and the number settled after year 1 is unknown. For claims reported in 1999 
there were 4 settled in year 0 and the number settled after year 0 is unknown. 
Let N be the year in which a randomly selected claim is settled and assume 
that it has probability function Pt(N = n) =p n = (1— p)p n 1 n = 0,1,2,.... 
Determine the maximum likelihood estimate of p. 
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12.42 (*) A sample of n independent observations rci, • •. ， rc n came from a 
distribution with a pdf of f(x) = 20xexp(—0x 2 ), a; > 0. Determine the 
maximum likelihood estimator (mle) of 6. 

12.43 (*) Let xi,...,rc n be a random sample from a population with cdf 
F(x) = x p , 0 <x <1. Determine the mle of p. 

12.44 A random sample of 10 claims obtained from a gamma distribution is 
given below: 

1,500 6,000 3,500 3,800 1,800 5,500 4,800 4,200 3,900 3,000 

(a) (*) Suppose it is known that a = 12. Determine the maximum 
likeliliood estimate of 6. 

(b) Determine the maximum likelihood estimates of a and 9. 



a loss will exceed 4,500. 

12.46 (*) Let ... ,x n be a random sample from a random variable with 
pdf f(x) = Q - 1 e — x / Q 、 a; > 0. Determine the maximmn likelihood estimator 
oi0. 


12.4T (*) The random variable X has pdf f(x) = P~ 2 x exp(—0.5x 2 //? 2 ), 
J3 > 0. For this random variable, E(X) = (0/2)y/27r and Var(X) = 2 分 
tt/? 2 /2. You are given the following five observations: 


4.9 1.8 3.4 6.9 4.0 


Determine the maximum likelihood estimate of /3. 

12.48 (*) Let xi ， … ， x n be a random sample from a random variable with 
cdf F(x) = 1 — x— a , x > 1, a > 0. Determine the maximum likelihood 
estimator of a. 


12.49 (*) The random variable X has pdf /(x) = aA a (A + a;)— 戊一 1 ， ： r ， a，A > 
0. It is known that A = 1,000. You are given the following five observations: 

43 145 233 396 775 


Determine the maximum likelihood estimate of a. 
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Table 12.5 Data for Exercise 12.51 


Loss 

No. of observations 

Loss 

No. of observations 

0-25 

5 

350-500 

17 

25-50 

37 

500-750 

13 

50-75 

28 

750-1000 

12 

75-100 

31 

1,000-1,500 

3 

100-125 

23 

1,500-2,500 

5 

125 - 150 

9 

2,500-5,000 

5 

150 - 200 

22 

5,000-10,000 

3 

200 - 250 

17 

10,000-25,000 

3 

250 - 350 15 

25 . 誦一 9 


were couecxea. n 

> 200). When a parametric model is called for, use the single-parameter 

distribution for which F(x) = 1 — (100/x) Q , x > 100, a > 0. 


132 149 476 147 135 110 176 107 147 165 

135 117 110 111 226 108 102 108 227 102 


(a) Determine the empirical estimate of Pv(X > 200). 

(b) Determine the method-of-moments estimate of the single-parameter 
Pareto parameter a and use it to estimate Px(X > 200). 

(c) Determine the maximum likelihood estimate of the single-parameter 
Pareto parameter a and use it to estimate Pr(X > 200) • 


The data in Table 12.5 presents the results of a sample of 250 losses, 
sr the inverse exponential distribution with cdf F{x) = e~ Q / x > 
0. Determine the maximum likelihood estimate of 9. 


Consider the inverse Gaussian distribution with density given by 
fx{x) 

(a) Show that 


(士） ( 


.貯 ) 2 卜〉 ◦. 


雜- ㈣ -此 


where x = (l/.n) 

: b) For a sample ( 別 ，…，〜)， show that the maximum likelihood 


timates of n. and 0 , 


and 
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HO 

12.53 Suppose that Xi, … ， X n axe independent and normally distributed 
with mean E(Xj) = jj, and Vax(Xj) = (6m j)^ 1 , where rrtj > 0 is a known 
constant. Prove that the maximum likelihood estimates of fi and 6 are 


=X 


and 




where X = {1/m) Y^j-i TJijXj andm = 


12.3 VARIANCE AND INTERVAL ESTIMATION 

In general, it is not easy to determine the variance of complicated estima¬ 
tors such as the maximum likelihood estimator. However, it is possible to 
approximate the variance. The key is a* theorem that can be found in most 
mathematical statistics books. The particular version stated here and its 
multi-parameter generalization is taken from [112] and stated without proof. 
Recall that L(0) is the likelihood function and 1(0) its logarithm. All of the 
results assume that the population has a distribution that is a member of the 


Theorem 12.13 Assume that the pdf (pf in the discrete case) /(x; d) sat¬ 
isfies the following for 0 in an interval containing the true value (replace 
integrals by sums for discrete variables): 

(i) ]nf(x;0) is three times differentiable with respect to 6. 


side the integral and so we are just differentiating the constant 1/ 

d 2 

(iii) f ^2/(x; 9) dx = 0. This is the 5ame concept for the second derivative. 


^he integrals in (ii) and (iii) are to be evaluated over the range of x values for which 
f(x;0)>O. 














352 PARAMETER ESTIMATION 


(iv) 一 oo < / f(x] 0)—^]nf(x; 6)dx < 0. This establishes that the indicated 
integral exists and that the location where the derivative is zero is a 
maximum. 

(v) There exists a function H(x) such that f H{x)f{x\0)dx < oo with 
ln/(o;;0)| < H(x). This makes sure that the population is not 

overpopulated with regard to extreme values. 

Then the following results hold: 

(a) As noo, the probability that the likelihood equation [L 7 (0) = 0] has a 
solution goes to 1. 

(b) As n oo, the distribution of the maximum likelihood estimator 6 n 
converges to a normal distribution with mean 6 and variance such that 
1(0) Var(0 n ) —> 1, where 


w 


m = -nE ^ln/(X;0)] =~njf(x-,6)~hif(x-,e)dx 

= nE [(备 Wd 61 )) I =n / f( x ; 0 ) (基 工; 0 )) 咖. 

For any z, the last statement is to be interpreted as 

^ Pr ( p ^ F ^ <z ) =$(2) 

and therefore [1(0)] 一 1 is a useful approximation for Var(0 n ). The quantity 
1(0) is called the information (sometimes more specifically ， Fisher’s infor¬ 
mation). It follows from this result that the maximum likelihood estimator 
is asymptotically unbiased and consistent. The conditions in statements (i)- 
(v) axe often referred to as “mild regularity conditions.” A skeptic would 
translate this statement as “conditions that are almost always true but are 
often difficult to establish, so we’ll just assume they hold in our case.” Their 




12.55. 
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the result uses the logarithm of the likelihood function: 

■ 一 E [晷，卜 [(>)) 2 ]. 

The only requirement here is that the same parameter value apply to each 
observation. 

If there is more than one parameter, the only change is that the vector 
of maximum likeliliood estimates now has an asymptotic multivariate normal 
distribution. The covariance matrix 9 of this distribution is obtained from the 
inverse of the matrix with (r, s)th element, 

_ 一 (㈣] 

= E [w r m wM ^ nE [w r ^ x ^i：^ x ^ - 

The first expression on each line is always correct. The second expression 
assumes that the likelihood is the product of n identical densities. This ma¬ 
trix is often called the information matrix. The information matrix also 
forms the Cramer-Rao lower bound. That is, under the usual conditions, no 
unbiased estimator has a smaller variance than that given by the inverse of 
the information. Therefore, at least asymptotically, no unbiased estimator is 
more accurate than the maxmmm likelihood estimator. 


Example 12.14 Estimate the covariance matrix of the maximum likelihood 
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The zeros off the diagonal indicate that the two parameter estimates 




so true for any sample size. One thing we could do with this 
Dnstruct approximate 95% confidence intervals for the true 
3. These would be 1.96 standard deviations on either side of 


6.1379 ± 1.96(0.0965) 1/2 = 6.1379 ± 0.6089, 
1.3894 土 1.96(0.0483) 1 / 2 = 1.3894 土 0.4308. 



information matrix, it is necessary to take both derivatives 




The result is called the observed information. 


Example 12.15 Estimate the covariance in the previous example using the 
observed information. 


Substituting the observations into the second derivatives produces 
d 2 l n 20 

di^ = 

d 2 l n ^\nx j - l i n 122.7576 - 20/x 

= ~ 2 2^—^3 — = ~ 2 ^3 ， 

J=1 

d 2 l n 20 c 792.0801 - 245.5152^ + 20/x 2 

= 7 —^ — = - ^ - ' 

• 3=1 

Inserting the parameter estimates produces the negatives of the entries of the 
observed information, 

㉟ ： - 10 . 3600 ,㈣： 0 , ^ = - 20 - 7190 - 

Changing the signs and inverting produce the same values as in (12.1). This is 
a feature of the lognormal distribution that need not hold for other models. □ 


Sometimes it is not even possible to take the derivative. In that case an 
approximate second derivative can be used. A reasonable approximation is 


d 2 m 

ddiddj 


{- \hiei + \hjBj) 一 /(0 + \hiBi - \hjQj) 

- \hiei 4 - \hjQj) + f(0 — \hi&i — |/ij-ej)] } 
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where is a vector with all zeros except for a 1 in the ith position and 
hi = 0i/lO v , where v is one-third the number of significant digits used in 
calculations. . 

Example 12.16 Repeat the previous example using approximate derivatives. 

Assume that there axe 15 significant digits being used. Then hi = 6.1379/10 5 
and /i2 = 1.3894/10 5 . Reasonably close values are 0.00006 and 0.00001. The 
first approximation is 

d 2 l 士 邶 .13796, 1.3894) - 2/(6.1379,1.3894) + ； (6.13784,1.3894) 

~ (0.00006) 2 

_ -157.71389308198 - 2(-157.71389304968) + (-157.71389305468) 
(0.00006) 2 

=-10.3604. 

The other two approximations are 

d 2 l d 2 l 

硕 士_ 03 ， ^ 2 =- 20 - 7208 . 

We see that here the approximation works very well. □ 


The information matrix provides a method for assessing the quality of the 
maximum likelihood estimators of a distribution’s parameters. However, we 
are often more interested in a quantity that is a function of the parameters. 
For example, we might be interested in the lognormal mean as an estimate of 
the population mean. That is, we want to use exp(/i + a 2 /2) as an estimate 
of the population mean, where the maximum likelihood estimates of the pa¬ 
rameters are used. It is very difficult to evaluate the mean and variance of 
this random variable because it is a complex function of two variables that 
already have complex distributions. The following theorem (from [108]) can 
help. Tlie method is often called the delta method. 


Theorem 12.17 Let X n = (Xi n ,..., Xk n ) T be a multivariate random vari- 
a ble of dimension k based on a sample of size n. Assume that X is asymptot- 






asymptotically normal with mean 
；is the vector of first derivatives, 
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Let 0 be an estimator of 6 that has an asymptotic normal distribution with 
mean 0 and variance o^/n. Then g(0) has an asymptotic normal distribution 
with mean g(6) and asymptotic variance b’(0)j(<J 2 /n)[p’(0)] = g f (6) 2 a 2 /n. 

Example 12.18 Use the delta method to approximate the variance of the 
maximum likelihood estimator of the probability that an observation from an 
exponential distribution exceeds 200. Apply this result to Data Set B. 

Prom Example 12.8 we know that the maximum likelihood estimate of 
the exponential parameter is the sample mean. We are asked to estimate 
p = Pv(X > 200) = exp(—200/0). The maximum likelihood estimate is 
p = exp (— 200/0) = exp(—200/x). Determining the mean and variance of 
this quantity is not easy. But we do know that Var(X) = Vax(X)/n = 9 2 /n. 
Furthermore, 

g(0) = e~ 200/e , g 1 {6) = 2000 一 2 e ~* 2OO/0 ， 
and therefore the delta method gives 

v 卜、 . (2QOe~ 2 e~- 20 °/ e ) 2 d 2 4O,OOO6T 2 e- 400 / 0 


x = 1,424.4, 


_ … ——— — J5S9s9L ._ — __ _ ___1 
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and the estimates of these quantities are 1,215.75 and 1,689.16, respectively. 
The delta method produces the following approximation: 

•㈣ 卜 [1^15-75 1,689.16 ][°-° 0 965 。二 3 ] [溫幻 

= 280,444. 

The confidence interval is 1,215.75 ± 1.96^/280,444 or 1,215.75 ± 1,037.96. 

The customary confidence interval for a population mean is x 土 1.96s/v^ 
where is s 2 is the sample variance. For Data Set B the interval is 1 ， 424.4 士 
1.96(3,435.04)/\/20 or 1,424.4 士 1 ， 505.47. It is not surprising that this is a 
wider interval because we know that (for a lognormal population) the maxi- 
mum likelihood estimator is asymptotically UMVUE. □ 

12.3.1 Exercises 

12.54 Determine 95% confidence intervals for the parameters of exponential 
and gamma models for Data Set B. The likelihood function and maximum 
likelihood estimates were determined in Example 12.8. 

12.55 Let X have a uniform distribution on the interval from 0 to 9. Show 
that the maximum likelihood estimator is 9 = max(Xi,... ， X n ). Use Exam¬ 
ples 9.7 and 9.10 to show that this estimator is asymptotically unbiased and 
to obtain its variance. Show that Theorem 12.13 yields a negative estimate 
of the variance and that item (ii) in the conditions does not hold. 

12.56 Use the delta method to construct a 95% confidence interval for the 
mean of a gamma distribution using Data Set B. Preliminary calculations are 
in Exercise 12.54. 

12.57 (*) For a lognormal distribution with parameters /i and a you are given 
that the maximum likelihood estimates are /i = 4.215 and a = 1.093. The 
estimated covariance matrix of (/i, a) is 

0.1195 0 1 

0 0.0597 J - 

The mean of a lognormal distribution is given by exp(fi-\-a 2 /2). Estimate the 
variance of the maximum likelihood estimator of the mean of this lognormal 
distribution using the delta method. 

12.58 (*) A distribution has two parameters, a and /3. A sample of size 10 
produced the following loglikelihood function: 

l{a,P) = —2.5a 2 - 3a/3 - /3 2 + 50a + 2/3 + A :， 

where A is a constant. Estimate the covariance matrix of the mayimum like¬ 
lihood estimator (a, P). 
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12.59 In Exercise 12.39 two maximum likeliliood estimates were obtained for 
the same model. The second estimate was based on more information than 
the first one. It would be reasonable to expect that the second estimate is 
more accurate. Confirm this by estimating the variance of each of the two 
estimators. Do your calculations using the observed likelihood. 

12.60 This is a continuation of Exercise 12.43. Let a ； i，. •. 5 be a random 
sample from a population with cdf F(x) = x p , 0 < x < 1. 

(a) Determine the asymptotic variance of the maximum likeliliood es¬ 
timator of p. 

(b) Use your answer to obtain a general formula for a 95% confidence 
interval for p. 

(c) Determine the maximum likelihood estimator of E(X) and obtain 
its asymptotic variance and a formula for a 95% confidence interval. 

12.61 This is a continuation of Exercise 12.46. Let xi, • • • ,x n be a random 
sample from a population with pdf f(x) — 0— 一工 / 0 ， x > 0. 

(a) Determine the asymptotic variance of the maximum likelihood es¬ 
timator of 0. 

(b) (*) Use your answer to obtain a general formula for a 95% confi¬ 
dence interval for 0. 

(c) Determine the maximum likelihood estimator of Var(X) and obtain 
its asymptotic variance and a formula for a 95% confidence interval. 

12.62 (*) A sample of size 40 has been taken from a population with, pdf 
f(x) — (2?r0) 一工广己一工 2 /( 20 )， —oo < x < oo, 0 > 0. The maximum likelihood 
estimate of 0 is 0 = 2. Approximate the MSE of Q. 

12.63 Four observations were made from a random variable having the den¬ 
sity function f(x) = 2 入 xe" ■ 入怎 2 ， x, A > 0. Exactly one of the four observations 
was less than 2. 

(a) (*) Determine the maximum likelihood estimator of A. 

(b) Approximate the variance of the maximum likelihood estimator of 
A. 

12.64 Estimate the covariance matrix of the maximum likelihood estimators 
for the data in Exercise 12.44 with both a and 0 unknown. Do this by com¬ 
puting approximate derivatives of the loglikelihood. Then construct a 95% 
confidence interval for the mean. 

12.65 Estimate the variance of the maximum likelihood estimator for Exer¬ 
cise 12.49 and use it to construct a 95% confidence interval for E(X A 500). 
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12.66 Consider a random sample of size n from a Weibull distribution. For 
this exercise, write the Weibull survival function as 

制 = exp 卜 pi 户 ] T }. 

For this exercise, assume that r is known and that only fj, is to be estimated. 

(a) Show that E(X)=〆• 

(b) Show that the maximum likelihood estimate of /x is 

i/t 


(c) Show that using the observed information produces the variance 
estimate 

Fa K A) = &. 

where ji is replaced by pi. 

(d) Show tHat using the information (again replacing fi with p) pro¬ 
duces the same variance estimate as in part (c). 

(e) Show that/i has a transformed gamma distribution with a — n, 0 = 
/in™ 1 / 7 ", and r = r. Use this to obtain the exact variance of 
(as a function of //). Hint - The variable X T has an exponential 
distribution and so the variable X^=i XJ has a gamma distribution 
with first parameter equal to n and second parameter equal to the 
mean of the exponential distribution. 


12.4 BAYESIAN ESTIMATION 

All of the previous discussion on estimation has assumed a frequentist ap¬ 
proach. That is, the population distribution is fixed but unknown, and our 
decisions are concerned not only with the sample we obtained from the pop¬ 
ulation but also with the possibilities attached to other samples that might 
have been obtained. The Bayesian approach assumes that only the data ac¬ 
tually observed are relevant and it is the population that is variable. For 
parameter estimation the following definitions describe the process and then 
Bayes’ theorem provides the solution. 

12.4.1 Definitions and Bayes’ theorem 

Definition 12.20 T'h.P. fiifdvrihll.f.inn. is n nrnhnhilii/n Qi.r'ih'ilf/inn noifir 


A = r(i + r-) it 


opinion concerning the relative chances that various values of 6 are the true 
value of the parameter. 

As before, the parameter 9 may be scalar or vector valued. Determination 
of the prior distribution has always been one of the barriers to the wide¬ 
spread acceptance of Bayesian methods. It is almost certainly the case that 
your experience has provided some insights about possible parameter values 
before the first data point has been observed. (If you have no such opin¬ 
ions, perhaps the wisdom of the person who assigned this task to you should 
be questioned.) The difficulty is translating this knowledge into a probabil¬ 
ity distribution. An excellent discussion about prior distributions and the 
foundations of Bayesian analysis can be found in Lindley [83], and for a dis¬ 
cussion about issues surrounding the choice of Bayesian versus frequentist 
methods, see Efron [32]. The book by Klugman [77] contains more detail on 
the Bayesian approach along with several actuarial applications. More re¬ 
cent articles applying Bayesian methods to actuarial problems include [25], 
[101], [119], and [133]. A good source for 汪 thorough mathematical treatment 
of Bayesian methods is the text by Berger [13]. In recent years many ad¬ 
vancements in Bayesian calculations have occurred. A good resource is [22]. 
Scollnik [118] has demonstrated how the computer program WINBUGS can 
be used to provide Bayesian solutions to actuarial problems. 

Due to the difficulty of finding a prior distribution that is convincing (you 
will have to convince others that your prior opinions are valid) and the pos¬ 
sibility that you may really have no prior opinion, the definition of prior 
distribution can be loosened. 

Definition 12.21 An improper prior distribution is one for which the 
probabilities (or pdf) are nonnegative but their sum (or integral) is infinite. 

A great deal of research has gone into the determination of a so-called 
noninformative or vague prior. Its purpose is to reflect minimal knowledge. 
Universal agreement on the best way to construct a vague prior does not exist. 
However, there is agreement that the appropriate noninformative prior for a 
scale parameter is tt(6) = 1/0,6 > 0. Note that this is an improper prior. 

For a Bayesian analysis, the model is no different than before. 

Definition 12.22 The model distribution is the probability distribution for 
the data as collected given a particular value for the parameter. Its pdf is 
denoted / X |©(x|0), where vector notation for x is used to remind us that all 
the data appear here. Also note that this is identical to the likelihood function 
and so that name may also be used at times. 

If the vector of observations x = , x n ) T consists of independent and 

identically distributed random variables, then 


： (x|e) = fx\e(»i\0) - - - fx\&{xn\0) - 







362 PARAMETER ESTIMATION 


BAYESIAN ESTIMATION 363 


We use concepts from multivariate statistics to obtain two more definitions. 
In both cases, as well as in the following, integrals should be replaced by sums 
if the distributions are discrete. * 

Definition 12.23 The joint distribution has pdf 

/x,©(x,0) = /xi©(x|0)7r(0). 

Definition 12.24 The marginal distribution of x has pdf 
/x(x) = J /x|e(x|0)7r(0) d6. 

Compare this definition to that of a mixture distribution given by (4.4) on 
page 59. The final two quantities of interest are the following. 

Definition 12.25 The posterior distribution is the conditional probability 
distribution of the parameters given the observed data. It is denoted 7 r© | X (01x). 

Definition 12.26 The predictive distribution is the conditional proba¬ 
bility distribution of a new observation y given the data x. It is denoted 
/y|x(y|x). 10 

• These last two items axe the key output of a Bayesian analysis. The pos¬ 

terior distribution tells us how our opinion about the parameter has changed 
once we have observed the data. The predictive distribution tells us what 
the next observation might look like given the information contained in the 

data (as well as, implicitly, our prior opinion). Bayes 5 theorem tells us how 

to compute the posterior distribution. 

Theorem 12.27 The posterior distribution can be computed as 

j /x|©(x|0)7r(0)d0 

while the predictive distribution can be computed as 

/y|x(y|x) = J fY\Q(y\9)^e\x{0\y ： )d9, 

where fy\Q(y\6) is the pdf of the new observation, given the parameter value. 

10 In this section and in any subsequent Bayesian discussions, we reserve /(•) for distribu¬ 
tions concerning observations (such as the model and predictive distributions) and ?r(.) for 
distributions concerning parameters (such as the prior and posterior distributions). The 
arguments will usually make it clear which particular distribution is being used. To make 
matters explicit, we also employ subscripts to enable us to keep track of the random vari¬ 
ables. 


( 12 . 2 ) 


(12.3) 


The predictive distribution can be interpreted as a mixture distribution 
where the mixing is with respect to the posterior distribution. The following 
example illustrates the above definitions and results. The setting, though not 
the data, is taken from Meyers [92]. 

Example 12.28 The following amounts were paid on a hospital liability pol¬ 
icy ： 


125 132 141 107 133 319 126 104 145 223 


The amount of a single payment has the single-parameter Pareto distribution 
with 9 = 100 and a unknown. The prior distribution has the gamma distribu¬ 
tion with a = 2 and 0 = 1. Determine all of the relevant Bayesian quantities. 

The prior density has a gamma distribution and is 
?r(a) = ae 一 ' a > 0, 


while the model is (evaluated at the data points) 


/x|^(x|a) « 


q ； 10 (1QQ) 10o: 

(n^ +1 ) 


= a 10 e -3.801121o：-49.852823 


The joint density of x and A is (again evaluated at the data points) 

/x,A(x,a) = a ll e -4-80H21a-49.852823 

The posterior distribution of a is 

a ll e -4.801121a-49.852823 4.801121a 

冗 A|x(a|x)= jcc alle _ 4 _ 80n21a _ 4 g 852823da = ( 11 !)( 1 / 4 8011 21 ) 12 - ( 12 . 4 ) 

There is no need to evaluate the integral in the denominator. Because we 
know that the result must be a probability distribution, the denominator is 
just the appropriate normalizing constant. A look at the numerator reveals 
that we have a gamma distribution with a = 12 and 0 = 1/4.801121. 

The predictive distribution 3s 


/y|x(2/W 




y a+1 (11!)(1/4.801121) 12 da 


2/(ll!)(l/4.801121) 12 


a 12 e -(0.195951+In»)a da 


_ 1 _( 12 !) 

y(ll!)(l/4.801121) 12 (0.195951 + Iny) 13 
12(4.801121) 12 

2/(0.195951 +In 2/)13' V> 10 ° - 


(12.5) 


WMe this density function may not look familiar, you are asked to show in 
Exercise 12.67 that In — In 100 has the Pareto distribution. □ 











request. No doubt a specific number, perhaps with a margin for error, 
it is desired. The usual Bayesian solution is to pose a loss function. 


Definition 12.29 A loss function 6j) describes the penalty paid by 
the investigator when 9j is the estimate and Qj is the true value of the jth 
parameter. 

It would also be possible to have a multidimensional loss function 1(0,0) 
which allowed the loss to depend simultaneously on the errors in the various 
parameter estimates. 

Definition 12.30 The Bayes estimate for a given loss function is the one 
that minimizes the expected loss given the posterior distribution of the para¬ 
meter in question. 

The three most commonly used loss functions are defined as follows. 

Definition 12.31 For squared-error loss the loss function is (all subscripts 
are dropped for convenience) 1(6,0) = 0 — 0) 2 . For absolute loss it is 
10,0) = \B — 8\. For zero-one loss it is 10,9) = 0 if 0 = 6 and is 1 
otherwise. 

The following theorem indicates the Bayes estimates for these three com¬ 
mon loss functions. 

Theorem 12.32 For squared-error loss the Bayes estimate is the mean of 
the posterior distribution，for absolute loss it is a median, and for zero-one 
loss it is a mode. 

Note that there is no guarantee that the posterior mean exists or that the 
posterior median or mode will be unique. When not otherwise specified, the 
term Bayes estimate will refer to the posterior mean. 

Example 12.33 (Example 12.28 continued) Determine the three Bayes esti¬ 
mates of a. 

The mean of the posterior gamma distribution is a6 = 12/4.801121 = 
2.499416. The median of 2.430342 must be determined numerically while the 
mode is {a. — 1)0 = 11/4.801121 = 2.291132. Note that the a used here is 


For forecasting purposes, the expected value of the predictive distribution 
is often of interest. It can be thought of as providing a point estimate of the 
(n+l)th observation given the first n observations and the prior distribution. 
It is 


E(y|x) 


J Vfv\yi{y\x)dy 

Jy J fY\e(y\d)Tr&\x{^)dedy 

J ^8|x(0|x) J yfy\e(y\0)dyd6 
J E(y|0)7re|x(0|x)d0. 


( 12 . 6 ) 


Equation (12.6) can be interpreted as a weighted average using the posterior 
distribution as weights. 


Example 12.34 (Example 12.28 continued) Determine the expected value of 
the 11th observation，given the first 10. 

For the single-parameter Pareto distribution, E(Y\a) = 100a7(a — 1) for 
a > 1. Because the posterior distribution assigns positive probability to values 
of a < 1, the expected value of the predictive distribution is not defined. □ 


The Bayesian equivalent of a confidence interval is easy to construct. The 
following definition will suffice. 

Definition 12.35 The points a <b define a 100(1 — a)% credibility inter¬ 
val for 0j provided that Pr(a < Qj < 6|x) > 1 — a. 

The use of the term credibility has no relationship to its use in actuarial 
analyses as developed in Chapter 16. The inequality is present for the case 
where the posterior distribution of 6j is discrete. Then it may not be possible 
for the probability to be exactly 1 一 a. This definition does not produce a 
unique solution. The following theorem indicates one way to produce a unique 
interval. 

Theorem 12.36 If the posterior random variable 0j\x is continuous and uni- 
modal，then the 100(1 — a)% credibility interval with smallest width b — a is 
the unique solution to 

[ Tre^ixC^ilx)^ = 1-a, 

Ja 

?r©jx(a|x) = 7r©|x(b|x). 


This interval is a special case of a highest posterior density (HPD) credibility 













Example 12.37 (Example 12.28 continued) Determine the shortest 95% cred¬ 
ibility interval for the parameter a. Also determine the interval that places 
2.5% probability at each end. 


The two equations from Theorem 12.36 are 


Pr(a <A< 6|x) = r(12; 4.801121&) - r(12; 4.801121a) = 0.95 

a ll e -4.801121a = jll e -4.80112l6 


and numerical methods can be used to find the solution a = 1.1832 and 
b = 3.9384. The width of this interval is 2.7552. 

Placing 2.5% probability at each end yields the two equations 


r(12; 4.8011216) = 0.975, r(12; 4801121a) = 0.025. 


This solution requires either access to the inverse of the incomplete gamma 
function or the use of root-finding techniques with the incomplete gamma 
function itself. The solution is a = 1.2915 and b = 4.0995. The width is 
2.8080, wider than the first interval. Figure 12.1 shows the difference in the 

IPD interval. The total 
^ther 95% interval must 
).025 probability on each 
ict the same probability 
Lt limit must be moved a 
r over that interval than 


by inverting the matrix of second partial derivatives of the ne 
of the posterior density. 


mm 
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The variance estimate is the reciprocal. Evaluated at the modal estimate of 
Q ： we get (2.291132) 2 /11 = 0.477208 for a credibility interval of 2.29113 士 
1.96(0.477208) 1 / 2 , which produces a = 0.9372 and 6 = 3.6451. □ 

The same concepts can apply to the predictive distribution. However, 
the Bayesian central limit theorem does not help here because the predictive 
sample has only one member. The only potential use for it is that for a large 
original sample size we can replace the true posterior distribution in (12.3) 
with a multivariate normal distribution. 

Example 12.41 (Example 12.28 continued) Construct a 95% highest density 
prediction interval for the next observation. 

It is easy to see that the predictive density function (12.5) is strictly de¬ 
creasing. Therefore the region with highest density runs from a = 100 to b. 
The value of b is determined from 

f b 12(4.801121) 12 , 

™ 人 00 y(0.195951 + ln y p dy 

4n(6/ioo) i2(4.801121) 12 

(4.801121 +a:) 13 X 
4.801121 

4801121-hln(6/100) 

and the solution is 6 = 390.1840. It is interesting to note that the mode of 
the predictive distribution is 100 (because the pdf is strictly decreasing) while 
the mean is infinite (with b = oo and an additional y in the integrand, after 
the transformation, the integrand is like e x x"~ 13 , which goes to infinity as x 
goes to infinity). □ 

The following example revisits a calculation done in Section 4.6.3. There 
the negative binomial distribution was derived as a gamma mixture of Poisson 
variables. The following example shows how the same calculations arise in a 
Bayesian context. 

Example 12.42 The number of claims in one year on a given policy is known 
to have a Poisson distribution. The parameter is not known，but the prior 
distribution has a gamma distribution with parameters a and 6. Suppose in 
the past year the policy had x claims. Use Bayesian methods to estimate the 
number of claims in the next year. Then repeat these calculations assuming 
claim counts for the past n years, xi ,... 
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The key distributions are (where x = 0,1， ■ 


Prior: 

入 《 - i g _V 设 

… 卜 r(a)0 a 

Model: 

,m AXe-A 

Joint: 

入 x+a-l e -(l+l/0 ) 入 

咖 A) _ dr ⑷ r 

Marginal: 

poo \x+a-l —(l-\rl/d)X 

^=L 释， & 


T(x -h a) 

rAT(a)e a (l + 1/0 产 +。 
+ a -1\ ( 1 、 


Ml 、 入咖 -k-g+i 鄰 / r(x + q) 

Posterior. «( 入册 ⑷# / 此⑷心 + 1/0)* 

= T{x^a) 

The marginal distribution is negative binomial with r = a and ^ — B. The 
posterior distribution is gamma with shape parameter “a” equal to a; + a and 
scale parameter “0” equal to (1 + 1/0) 一 1 = 0/(1 + 6). The Bayes estimate 
of the Poisson parameter is the posterior mean, (x + a)0/(l + 8). For the 
predictive distribution, (12.3) gives 


,, 、 r°° x y e~ x 一 1 e_( 1+1 / 0 ) 入 (i + i/ey^ .. 

= J 0 - f(^ - dA 

= (1 + 1 / 0 产 r °° X y +x+a - leH2+1/e) x dX 
y\T(x + a) J 0 

_ (l + l/g^rCy + x + a) 

= yirOr + a )(2 + 1/0)^+- ，没 _ 丄，…， 

and some rearranging shows this to be a negative binomial distribution with 
r = x + a and 0 = 0/(1 + 9). The expected number of claims for the next 
year is (rr 十 a)0/(l + 0). Alternatively, from (12.6), 

一 … f°° ' + l/e) x+a {x^ta)6 

E ^ = J 0 A - r(x + V — dX = -TTT- 

For a sample of size n, the key change is that the model distribution is now 

入 …入 

P(X|A)= 








Vax(y|x) = E e|x [Var(r|0, X )] + Vax e|x [E(y|0,x)] 

=E 0 | x [Var(F|0)] + Var e|x [E(y|0)]. 

The simplification on the inner expected value and variance results from the 
fact that, if 0 is known, the value of x provides no additional information 
about the distribution of Y. This is simply a restatement of (12.6). 

Example 12.43 Apply these formulas to obtain the predictive mean and vari¬ 
ance for the previous example. Then anticipate the credibility formulas of 
Chapter 16. 

The predictive mean uses E(y|A) = A. Then, 


The predictive variance uses Var(y|A) = A, and then 


Var(y|x) = E(A|x) + Vax(A|x) 


(nx + a)6 (nx-f a)0 2 
l + n0 + (l + n0) 2 



These agree with the mean and variance of the known negative binomial 
distribution for y. However, these quantities were obtained from moments 
of the model (Poisson) and posterior (gamma) distributions. The predictive 


Example 12.45 Show that the exponential distribution is of the form (12.10). 


/(x ； i 0)=r 1 

If we let 6 = 1/P, then the pdf is 


which is of the form (12.10) with p(x) = 1 and q{0) = 1/0. 口 

Example 12.46 Show that the Poisson distribution is a member of the linear 
exponential family. 




If we let 6 = — In A, then the pf is 
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which is of the form (12.10) with p(x) = l/x\ and q(0) = e e ~°. Note that in 
this parameterization the Poisson mean is e~ e . □ 

Example 12.47 Show that the normal distribution with mean /i and known 
variance v is a member of the linear exponential family. 


f(x;/j,,v) = (2irvy 1/2 exp [ 一去 ( ① —M) 2 ] 

"(2^- 1/2 exp(-g + fx-Q 


If we let 0 = —fi/v, the pdf : 


(2^)- 1 / 2 exp(-|-jexp(gx) 

eXP (C) 

s 

2ttv)~ 1/2 exp (—1) exp (-Ox) 


which is of the form (12.10) withp(a:) = (2 th ;) 一 1/2 exp[—a; 2 /(2t;)】，and q{6 )= 
eM0 2 v/2). □ 


We now find the mean and variance of the distribution defined by (12.10). 
First, note that 

ln/(x; 0) = lnp(a;) 一 - ]nq(9). 

Differentiate with respect to 6 to obtain 

^/(x ； 0)=[-x-^M] f( X] e). (12.11) 

Integrate (or sum) over the range of x (known not to depend on 6) to obtain 

J 雨 f( x '❹) dx = - J xf(x; 0) dx — J f{x\ 6)dx. 

On the left-hand side，interchange the order of differentiation and integration 
(or summation) to obtain 



We know that f f(x; 0)dx == 1 and f xf(x; 0) dx = E(X) and thus 

a ⑴一州） 

- ⑴ --EPQ- 丽 . 

In other words, the mean is 

E(x)=m = -徵 S ^ 6 ) - ( 12 . 12 ) 

To obtain the variance, (12.11) may first be rewritten as 

■^f(x-,e) = -[x- fj,(9)]f(x；e). 

Differentiate again with respect to 6 to obtain 

= y!{0)f{x\ 8) + [x- /j,(0)ff(x; d). 

Again, integrate over the range of x to obtain 

/ = 〆 ⑻ J" f(x ； e)dx + J [x- n(e)] 2 f(x-,e)dx. 



J[X- fj.{e)] 2 f(x\ e) dx = -/jf{9) + ^2 J f{x-,0)dx. 

Because fi(6) is the mean，the left-hand side is the variance (by definition), 
and then because the second term on the right-hand side is zero, we obtain 

d 2 

Var(-X") = v(0) =—〆(&) = ^2 ln^(0). (12.13) 

In Example 12.42 it turned out the posterior distribution was of the same 
type as the prior distribution (gamma). This makes calculations relatively 
easy. A definition of this concept follows. 

Definition 12.48 A prior distribution is said to be a conjugate prior dis¬ 
tribution for a given model if the resulting posterior distribution is of the 
same type as the prior (but perhaps with different parameters)• 







are independent and identically distributed with pf 




p{xj)e- Bx i 


where 0 has pdf 


where k and [x are parameters of the distribution and c(/x, k) is the normalizing 
constant Then the posterior p/tt©|x(^|x) is of the same form as 

Proof: The posterior distribution is 

D oc [ n >M e 屬 J ._ -〜-响 

‘（j ) K ^ C^,k) 

<x 刚广 ㈣ exp I" (fc + n)! 


K exp(-efi*k*), 

which is of the same form as tt(0 ) with parameters 

k* = k-{-n, 

fc + n k + n^ i 


in Theorem 12.49 is the gamma distribution. 

Prom Example 12.46 we have that q(0) = exp(e -0 ) and A = exp (- 0). The 
prior as given by the theorem is 

7r(0) oc [exp(e 一勺] 一丸 exp (— 0/xfc). 

Then the prior density for A is 

7r(A) cx [exp(A)]~ fc A^ fe A~' :l = A Aifc "~ 1 e"" AA: 3 

which is a gamma distribution with a = file and 9 = 1/fc. The term 入一 1 
appears because it is \d6/dX\, which is needed for the change of variable. □ 

12.4.4 Computational issues 

It should be obvious by now that all Bayesian analyses proceed by taking in¬ 
tegrals or sums. So at least conceptually it is always possible to do a Bayesian 


BAYESIAN ESTIMATION 375 


analysis. However, only in rare cases axe the integrals or sums easy to do, and 
that means most Bayesian analyses will require numerical integration. While 
one-dimensional integrations axe easy to do to a high degree of accuracy, mul¬ 
tidimensional integrals are much more difficult to approximate. A great deal 
of effort has been expended with regard to solving this problem. A number 
of ingenious methods have been developed. Some of them axe summarized in 
Klugman [77]. However, the one that is widely used today is called Markov 
chain Monte Carlo simulation. A good discussion of this method can be found 
in [118] and actuarial applications can be found in [21] and [119]. 

There is another way which completely avoids computational problems. 
This is illustrated using the example (in an abbreviated form) from Meyers 
[92], which also employed this technique. The example also shows how a 
Bayesian analysis is used to estimate a function of parameters. 


Example 12.51 Data were collected on 100 losses in excess of 100,000. The 
single-parameter Pareto distribution is to be used with 9 = 100,000 and a 
unknown. The objective is to estimate the layer average severity for the layer 
from 1,000,000 to 5,000,000. For the observations, Y, 3 =i ^ x 3 = 1,208.4354. 

The model density is 

„ , , 、 # a (ioo,ooor 

/x|A(x|a) = [[ ― -s+i — 

' 100 
=exp 100 In a + 100a In 100,000 - (a +1) In Xj 

i=i . 

=exp (lOOlno ： 一 ? 充 _ - 1,208.4354). 


The density appears in colunm 3 of Table 12.6. To prevent computer overflow, 
the value 1,208.4354 was not subtracted prior to exponentiation. This makes 
the entries proportional to the true density function. The prior density is 
given in the second colunm. It was chosen based on a belief that the true 
value is in the range 1-2.5 and is more likely to be near 1.5 than at the ends. 
The posterior density is then obtained using (12.2). The elements of the 
numerator axe found in column 4. The denominator is no longer an integral 
but a sum. The sum is at the bottom of column 4 and then the scaled values 
are in column 5. 

We can see from column 5 that the posterior mode is at a = 1.7，as 
compared to the maximum likelihood estimate of 1.75 (see Exercise 12.69). 
The posterior mean of a could be found by adding the product of columns 1 
and 5. Here we are interested in 汪 layer average severity. For this problem it 




















We can also use columns 5 and 6 to construct a credibility interval. Discard- 
ing the first five rows and the last four rows eliminates 0.0406 of posterior 
probability. That leaves (5,992, 34,961) as 汪 96% credibility interval for the 
layer average severity. Part of Meyers’ paper was the observation that even 
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12.67 Show that, if Y" is the predictive distribution in Example 12.28, then 
lnF-lnlOO has the Pareto distribution. 

12.68 Determine the posterior distribution of a in Example 12.28 if the prior 
distribution is an arbitrary gamma distribution. To avoid confusion, denote 
the first parameter of this gamma distribution by 7. Next determine a partic¬ 
ular combination of gamma parameters so that the posterior mean is the max- 
irrmm likelihood estimate of a regardless of the specific values of x \^..., x n . 
Is this prior improper? 

12.69 For Example 12.51 demonstrate that the maximum likelihood estimate 
of a is 1.75. 

12.70 Let xi，•. •, a: n be a random sample from a lognormal distribution with 
unknown parameters /z and cr. Let the prior density be cr) = cr • 

(a) Write the posterior pdf of " and cr up to a constant of proportion¬ 
ality. 

(b) Determine Bayesian estimators of \i and a by using the posterior 
mode. 

(c) Fixer at the posterior mode as determined in part (b) and then de¬ 
termine the exact (conditional) pdf of fi. Then use it to determine 
a 95% HPD credibility interval for /x. 

12.71 A random sample of size 100 has been taken from a gamraa distribution 
with a known to be 2, but 0 unknown. For this sample, x j = 30,000. 
The prior distribution for 6 is inverse gamnm with/3 taking the role of a and 
入 taking the role of 6. 

(a) Determine the exact posterior distribution of 9. At this point the 
values of f3 and A have yet to be specified. 

(b) The population mean, is 29. Determine the posterior mean of 26 
using the prior distribution first with /?= 入 = 0 [this is equivalent 
to ir{6) = 6~ x ) and then with ^ = 2 and A = 250 (which is a prior 
mean of 250). Then, in each case, determine a 95% credibility 
interval with 2.5% probability on each side. 

(c) Determine the posterior variance of 26 and use the Bayesian central 
limit theorem to construct a 95% credibility interval for 26 using 
each of the two prior distributions given in part (b). 
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12.72 Suppose that given G = ff the random variables X 1 ,...,X n axe inde¬ 
pendent and binomially distributed with pf 

fx^eixjie) = O，- e) K ^, Xj = 0,1,.. .,Kj, 

and ㊀ itself is beta distributed with parameters a and b and pdf 

°< 0<l - 

(a) Verify that the marginal pf of Xj is 


fx ^ Xj)= A = .，巧， 

() 

and E(Xj) = aKj/(^a-jrb). This distribution is termed the binomial- 
beta or negative hypergeometric distribution. 

(b) Determine the posterior pdf 7r©i x (0|x) and the posterior mean 
E(9|x).. 

12.73 Suppose that given 0 = 0 the random variables Xx,...,X n are inde¬ 
pendent and identically exponentially distributed with pdf 

fxj |e= 0e- 0 ' Xj > Q, 

and 0 is itself gamma distributed with parameters a> 1 and P >0, 
Qot-i e -e/0 

邱 ) = T(a)p a 1 6>0 - 
(a) Verify that the marginal pdf of Xj is 

fxAxj)= 邱 - a (/T 1 + Xj >0, 


This distribution is one form of the Pareto distribution. 

(b) Determine the posterior pdf 7r©i x (0|x) and the posterior mean 
E(©|x). 

12.74 Suppose that given 0 = 0 the random variables Xi,...,X n axe in¬ 
dependent and identically negative binomially distributed with, parameters r 
and 8 with pf 


fxjieixjie) 


f(l 一 W ， ％ = 0，1，2,…， 
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and 0 itself is beta distributed with parameters a and b and pdf 

，:驗" 1 ， 6 ' o<0<1 . 

(a) Verify that the marginal pf of Xj is 

f , 、 Tjr + Xj) T(a + b) T(a + r^jb + xj) 

fx ^= r(r) 勺！ r(a)r( 6 ) Yia + r + b + XiY Xj - 0 ’ 工 ， 2 ’ …’ 


This distribution is termed the generalized Waring distribu¬ 
tion. The special case where 6 = 1 is the Waring distribution 
and the Yule distribution if r = 1 and 6=1. 

(b) Determine the posterior pdf /©|x(^l x ) and the posterior mean 

E(e|x). 

12.75 Suppose that given 0 = 0 the random variables Xi, ..., X n are inde¬ 
pendent and identically normally distributed with mean /i and variance G— 1 
and 0 is gamma distributed with parameters a and (0 replaced by) 1//3. 

(a) Verify that the marginal pdf of Xj is 


fXi {xj) = 




which is a form of the t-distribution. 

(b) Determine the posterior pdf /©|x(^l x ) and the posterior mean 
E(fl|x). 

12.76 Prove that the binomial distribution with pf 


is of the form (12.10) and identify 0, p(x), and q(6). 

12.77 Consider the negative binomial distribution with pf 

m fPY 


If a is fixed, show that f(x; a, (3) is of the form (12.10) and identify 0, p(x), 
and q(d). 
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X n are independent and identically distributed with 


is the sample mean. In other words, if 6 is the maximum likelihood estimator 
of 0， prove that 

m=m=x. 


12.79 Consider the generalization of (12.10) given by 


where m is a known parameter. Prove that the mean is still given by (12.12) 
but the variance is given by v(9)/m, where v(0) is given by (12.13). 


12.80 Let X u 
on 0 with pf 


and identically distributed conditional 


fx^eixjlO)= 


p(.Xj)e- Bx i 


(a) Show that, conditional on 0, S has pf of the form 


where p n (s) does not depend on 0. 

(b) Prove that the posterior distribution 7r©| X (0|x) is the same as the 


(conditional) distribution of 0|5, 


fs\e{s\eM0) 


where tt(6) is the pf of 0 and fs(s) is the marginal pf of S. 


12.81 Suppose that given N the random variable X is binomially distributed 
with parameters N and p. 


(a) Show that, if iV is Poisson distributed, so is X (unconditionally) 
and identify the parameters. 

(b) Show that, if iV is binomially distributed, so is X (unconditionally) 
and identify the parameters. 

(c) Show that, if N is negative binomially distributed, so is X (uncon¬ 
ditionally) and identify the parameters. 


12.82 (*) A die is selected at random from an urn that < 


■■■■■■HI 


yielded the numbers 2, 3, 4, 1, and 4 in that order. Determine the probability 
that the selected die was die number 2. 


12.83 (*) The number of claims in a yeax, y, has a distribution which depends 
on a parameter 0. As a random variable, 0 has the uniform distribution on 
the interval (0,1). The unconditional probability that F is 0 is greater than 
0.35. For each conditional pf given below, determine if it is possible that it is 
the true conditional pf of Y. 

(a) Pr(F = y\6) = e- e 6 v / V \. 

(b) Pr(Y = y\9) = (y + 1)0 2 (1 - 6)y. 

⑷ Pv(Y = y\0) = Q)ey(i-e) 2 -y. 

12.84 (*) Your prior distribution concerning the unknown value of H is 
Pr(if = |) = # and Pr(JT = 每 ） =The observation from a single ex¬ 
periment has distribution Pr(£) = d\H = h) = h d (l - h) 1 ^ for d= 0,1. The 
result of a single experiment is d = 1. Determine the posterior distribution of 
H. 

12.85 (*) The number of claims in one year, Y, has the Poisson distribution 
with parameter 0. The parameter Q has the exponential distribution with, pdf 
tt(0) = e~ e . A particular insured had no claims in one year. Determine the 
posterior distribution of 6 for this insured. 

12.86 (*) The number of claims in one year, Y, has the Poisson distribution 
with parameter 6. The prior distribution has the gamma distribution with 
pdf ?r(0) = 0e~~ d . There was one claim in one year. Determine the posterior 
pdf of 9. 

12.87 (*) Each individual car’s claim count has a Poisson distribution with 
parameter 入 . All individual cars have the same parameter. The prior dis¬ 
tribution is gamma with parameters a = 50 and 6 = 1/500. In a two-year 
period, the insurer covers 750 and 1,100 cars in years 1 and 2, respectively. 
There were 65 and 112 claims in years one and two, respectively. Determine 
the coefficient of variation of the posterior gamma distribution. 

12.88 (*) The number of claims, r, made by an individual in one year has the 
binomial distribution with pf f(r) = (^)0 r (l — 0) 3__r . The prior distribution 
for 6 has pdf tt(9) = 6(0 — 6 2 ). There was one claim in a one-year period. 
Determine the posterior pdf of 6. 

12.89 (*) The number of claims for an individual in one yeax has a Poisson 
distribution with parameter A. The prior distribution for A has the gamma 









total of 110 claims has been observed. In each year there were 310 policies in 
force. Determine the expected value and variance of the posterior distribution 
of A. 

12.90 (*) The number of claims for an individual in one year has a Poisson 
distribution with parameter A. The prior distribution for A is exponential with 
an expected value of 2. There were three claims in the first year. Determine 
the posterior distribution of A. 

12.91 (*) The number of claims in one year has the binomial distribution 

with, n = 3 and 9 unknown. The prior distribution for 0 is beta with pdf 
7 r( 0 ) = 28O0 3 (1 — 0 ) 4 ， 0 < 0 < 1. Two claims were observed. Determine 

each of the following. 

(a) The posterior distribution of 0. 

(b) The expected value of 6 from the posterior distribution. 

12.92 (*) An individual risk has exactly one claim each. year. The amount of 
the single claim has an exponential distribution with pdf f(x) = te~~ tx ^ x > 0 . 
The parameter t has a prior distribution with pdf ?r(t) = te _t . A claim of 5 
has been observed. Determine the posterior pdf of t. 

12.93 Suppose that given ©i = 6 ± and ©2 = 62 the random variables X l5 • • • ， 
X n are independent and identically normally distributed with mean 9\ and 
variance 02 1 - Suppose also that the conditional distribution of 0x given 
0 2 = 0 2 is a normal distribution with mean fi and variance a 2 /62 and ©2 is 
gamma distributed with parameters a and 6 = 1//3. 

(a) Show that the posterior conditional distribution of 0i given ㊀ 2 = 
0 2 is normally distributed with mean 

1 ncr 2 一 

^ = T^ tl + TT^ x 






n{x — yi) 
2{l J tna 2 Y 


d the posterior marginal means E(0i|x) and E(02|x). 
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Table 12.7 Number of hospital liability claims by year 


Year 

Number of claims 

1985 

6 

1986 

2 

1987 

3 

1988 

0 

1989 

2 

1990 

1 

1991 

2 

1992 

5 

1993 

1 

1994 

3 


Table 12.8 

Hospital liability claims by frequency 

Frequency (k) Number of observations (n&) 

0 

1 

1 

2 

2 

3 

3 

2 

4 

0 

5 

1 

6 

1 

7+ 

0 



ibility policy has experienced the number of 
、m in Table 12.7. Estimate the Poisson para- 


d 1985-1994 








from Table 12.8. Let rik denote the number of years in which a frequency of 
exactly fc claims occurred. The expected frequency (sample mean) is 

where rik represents the number of observed values at frequency k. Hence the 
method-of-moments estimate of the Poisson parameter is A = 2.5. 

Maximum likelilLOod estimation can easily be carried out on these data. 
The likelihood contribution of an observation of k is pk. Then the likelihood 
for the entire set of observations is 


L=U.Pk k 


and the loglikelihood is 


The likelihood and loglikelihood functions are considered to be functions of 
the unknown parameters. In the case of the Poisson distribution, there is only 
one parameter, making the maximization easy. 

For the Poisson distribution, 


The loglikelihood is 


bipk = 一入 4- A; In A — In fc!. 


—>^ 叫(一 A + fcln 入一 lnfc!) 

oo oo 

=—An + k rifcln 入 一 y"^7ifclnfc!, 


where n = YlkLo sample size. Differentiating the loglikelihood with 

respect to A，we obtain 
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Prom this it can be seen that for the Poisson distribution the maximuxn like¬ 
lihood and the method-of-moments estimators axe identical. 

If N has a Poisson distribution with mean A, then 

E(A) = E(iV) = A 

aiid 

Vax(A) = ^W = ^ 

n n 

Hence, A is unbiased and consistent. From Theorem 12.13, the maximum 
likelihood estimator is asymptotically normally distributed with mean A and 
variance 

lnpiv 

={-nE [ 嘉 (-A + iVlnA-lniV!) 

=[nE(JV/A 2 )]- 1 

=(nA _1 ) 1 =—. 

、 J n 

In this case the asymptotic approximation to the variance is equal to its 
true value. From this information，we can construct an approximate 95% 
confidence interval for the true value of the parameter. The interval is A ± 
1.96(A/n) 2 / 2 . For this example, the interval becomes (1.52, 3.48). This con¬ 
fidence interval is only an approximation because it relies on large sample 
theory. The sample size is very small and such a confidence interval should 
be used with caution. □ 



Vax(A) = {-nE ^ 



The formulas presented so far have assumed that the counts at each ob¬ 
served frequency are known. Occasionally, data are collected so that this is 
not given. The most common example is to have a final entry given as fc +， 
where the count is the number of times k or more claims were observed. If 
is the number of times this was observed, the contribution to the likelihood 
function is 

(Pk+ Pk+l + … ) nk+ = (l-po - Pfe-l) nfc+ - 


The same adjustments apply to grouped frequency data of any kind. Sup- 


likelihood function 
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Table 12.9 Data for Example 12.53 


No. of claims/day 

Observed no. of days 

0 

47 

1 

97 

2 

109 

3 

62 

4 

25 

5 

16 

6+ 

9 


Example 12.53 For the data in Table 12.9 11 determine the maximum like¬ 
lihood estimate for the Poisson distribution. 

The likelihood function is 

L = Po 7 pl 7 pl 09 pfpfpl 6 (l-Po-P1-P2-P3-P4-Ps) 9 , 


and when written as a function of 入 ， it becomes somewhat complicated. While 



example, the maximum likelihood estimate is A = 2.0226, which is very close 
to the value obtained when all the counts were recorded. □ 



12.5.2 Negative binomial 

The moment equations axe 

= 至 (12.14) 

n 

and 

r/3(l +P)= - - ( Sg — 丫 = s 2 (12.15) 

n \ n J 

with solutions P = (s 2 /x) — l and f = x/P ，Note that this variance estimate is 
obtained by dividing by n, not n —1. This is a common, though not required, 

11 This is the same data as will be analyzed in Example 13.15 except the observations at 6 
or more have been combined. 


approach when using the method of moments. Also note that, if s 2 < x, the 
estimate of /? will be negative, an inadmissible value. 

Example 12.54 (Example 12.52 continued) Estimate the negative binomial 
parameters by the method of moments. 

The sample mean and the sample variance are 2.5 and 3.05 (verify this), 
respectively, and the estimates of the parameters are f = 11.364 and /?= 
0 . 22 . □ 


Wlien compared to tlie Poisson distribution with the same mean, it can be 
seen, that /3 is a measure of “extra-Poisson” variation. A value of /3 = 0 means 
no extra-Poisson variation，while a value of = 0.22 implies a 22% increase in 
the variance when compared to the Poisson distribution with the same mean. 

We now examine maYirmim likelihood estimation. The loglikellhood for 
the negative binomial distribution is 



parameters, set tne aenvaxives equal to zero, ana soivi 
The derivatives of the loglikelihood are 


dl ^ fk r + 

矿 h nk 、[河 


-Q- r = -X^ri fc ln(l+^) + 5]n fc —In - ^ — 

fc =0 fc=o 

oo q fc — 1 

=-nln(l +)3)+ In JJ(r + m) 

k=0 r m=0 

oo q k—1 

-nln(l + /3) + ^n fc — ^ln(r + m) 


oo fc — 1 i 

-nln(l + ^) + ^n fe ^rV； 
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Setting these equations to zero yields 

A = = 5 (12.18) 

n 

and 

oo /fc—1 

nln(l + 為） =y^n fc I ^ 

k=l \m=0 

Note that the maximum likelihood estimator of the mean is the sample mean 
(as, by definition, in the method of moments). Equations (12.18) and (12.19) 
can be solved numerically. Replacing ^ in (12.19) by jl/r yields the equation 

F (和 nln(l + ,)- ㊂ =0. (12.20) 

If the right-hand side of (12.15) is greater than the right-hand side of (12.14), 
it can be shown that there is a unique solution of (12.20). If not, then the 
negative binomial model is nrobablv not a erood model to use because the 


f+ m 


(12.19) 
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Table 12.10 Two models for automobile claims frequency 


No. of 

No. of 

Poisson 

Negative binomial 

claims/yeax 

drivers 

expected 

expected 

0 

20,592 

20,420.9 

20,596.8 

1 

2,651 

2,945.1 

2,631.0 

2 

297 

212.4 

318.4 

3 

41 

10.2 

37.8 

4 

7 

0.4 

4.4 

5 

0 

0.0 

0.5 

6 

1 

0.0 

0.1 

7+ 

0 

0.0 

0.0 

Parameters 








Loglikelihood 





accidents per driver in a one-year time period. The data as well as fitted 
Poisson and negative binomial distributions are given in Table 12.10. Based 
on the information presented，which distribution appears to provide a better 





be estimated, 
of some event 








many insurance situations, q is mterpreiea as tne pruuaumuy 

ti as death or disability. In such cases the value of q is usually 

A number of observed events 

y mayimnm number of possible events ， 
lod-of-moments estimator when m is known. 























Q = 


1 JlkLa kn k 

i Er= 0 ^ ’ 


( 12 . 21 ) 


where m is the maximum likelihood estimate of m. An easy way to approach 
the maximum likelihood estimation of m and g is to create a likelihood profile 
for various possible values of m as follows: 


Step 1 
Step 2 
Step 3 
Step 4 
Step 5 


Start with rh equal to the largest observation. 
Obtain q using (12.21). 

Calculate the loglikelihood at these values. 
Increase m by 1. 

Repeat steps 2-4 until a maximum is found. 


mq(l 一 q) = 0.890355. 


Hence, q = 0.096474 and m = 10.21440. However, m can only take on integer 
values. We choose m = 10 by rounding. Then we adjust the estimate of q to 
0.0985422 from the first moment equation. Doing this will result in a model 
variance which differs from the sample variance because 10(0.0985422)(1 — 
0.0985422) = 0.888316. This shows one of the pitfalls of using the method of 
moments with integer-valued parameters. 

We now turn to maximum likelihood estimation. From the data m > 7. 
If m is known, then only q needs to be estimated. If m is unknown, then we 


As with the negative binomial, there need not be a pair of parameters that 
maximizes the likelihood function. In particular，if the sample mean is less 


increasing loglikelihood values as 
trend is toward a Poisson model. 
Example 12.52. 


can produce a likelihood profile by maximizing the likelihood for fixed values 
of m starting at 7 and increasing until a maximum is found. The results are 
Table 12.12. 

The largest loglikelihood value occurs at m = 10. If, a priori, the value 
m is unknown, then the maximum likelihood estimates of the parameters 
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Table 12.12 Binomial likelihood profile 


m 

Q 

—Loglikelihood 

7 

0.140775 

19,273.56 

8 

0.123178 

19,265.37 

9 

0.109491 

19,262.02 

10 

0.098542 

19,260.98 

11 

0.089584 

19,261.11 

12 

0.082119 

19,261.84 


12.5.4 The (a, 6,1) class 

Estimation of the parameters for the (a, 6,1) class follows the same general 
principles that were used in connection with the (a, 6,0) class. 

Assuming that the data axe in the same form as the previous examples, the 
likelihood is, using (4.13), 

L = (PoT f[{p¥) nk = (PoT ft 

k=l fe=l 

-The loglikelihood is, 

l = n 0 lnp^ + ^n fc [ln(l-p^) + lnp^] 

k=l 

=n 0 lnp^ + ln(l 一 <) + 二 n fc [lnp fc - ln(l - p 0 )] 5 
k=l & =1 


where the last statement follows from p^, = pfc/(l —po)- The three parameters 
of the (a, 6,1) class are Pq 1 , a, and b, where a and b determine pi,P 2 j* ••- 








dependent 
〕n because 


resulting in 
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the proportion of observations at zero. This is the natural estimator because 
Pq 1 represents the probability of an observation of zero. 

Similarly, because the likelihood factors conveniently, the estimation of a 
and b is independent of p 含 1 • Note that although a and b are parameters 
maximization should not be done with respect to them. That is because not 
all values of a and b produce admissible probability distributions. 13 For the 
zero-modified Poisson distribution, the relevant part of the loglikelihood is 


h = h: 


k rik In A — （n — no) ln(l 


= 一 (n — n 0 ) [A + ln(l — e~ A )] 4-nxlnA + c, 

where x = ^ Y^kLo is the sample mean，n = YlkLo n &， an d c is indepen¬ 
dent of A. Hence, 


-(n-n 0 ) — (n - n 0 )： 


Setting this to zero yields 


By graphing each side as a function of A, it is clear that, if no > 0, there exist 










proportion at zero and the theoretical mean to the sample mean. This suggests 
that, by fixing the zero probability to the observed proportion at zero and 
equating the low order moments, a modified moment method can be used 
to get starting values for numerical maximization of the likelihood function. 
Because the maximum likelihood method has better asymptotic properties, 
it is preferable to use the modified moment method only to obtain starting 
values. 

For the purpose of obtaining estimates of the asymptotic variance of the 
maxixnmn likelihood estimator of A, it is easy to obtain 


@ —P — 二 e - A )2 一歹， 

and the expected value is obtained by observing that E(5) = (1 — Po f )A/(l — 
Finally, maybe replaced by its estimator, n 0 /n. The variance of 

is obtained by observing that the numerator, n 0 , has a binomial distribution 
and therefore the variance is p^il 

For the zero-modified binomial distribution, 

h = fc ] 

\k=l J k=l 

-f^n k ]n[l-(l-q) m ]+c 

k=l 

= nxinq-i- m(n — n 0 ) ln(l 一 g) — nx ln(l — q) 

一 (n — n 0 ) ln[l — (1 — q) m ] 4 - c 

where c does not depend on q and 

dli _ nx m(n - np) + nx 一 (n - no)m(l - g) 771 - 1 
q 1 — q 1 — q 1 — (1 — q) m • 

Setting this to zero yields 


where we recall that p 0 =： (1 一 q) m . This equation matches the theoretical 
mean with the sample mean. 

If m is known and fixed, the maximum likelihood estimator of is still 


識 _ ° fmUntilthe m -i—ofthe BLlihoor^ction 

Th n Z ! r ^^° dified ne g ative binomial (or extended truncated negative bi- 
^oi^ial) distribution is a bit more complicated because three parameters need 
：^ be estmated. Of course, the maximum likelihood estimator of i s 
= n °f n ^ before ，reducing the problem to the estimation of r- and 
The part of the loglikelihood relevant to r and is 

oo 

ll= ^Z nk ~ ( n -^o)In(l ~Po)- (12.25) 


This fimction needs to be maximized over the (r,/3) plane to obtain the max - 
nnum likelihood estimates. This can be done numerically using maximization 
procedures such as those described in Appendix F. Starting values can be ob- 
m0di£ed moment method by setting = n 0 / n and equating 
the first two moments of the distribution to the first two sample moments 
is generally easier to use raw moments (moments about the origin) than 
central moments for this purpose. In practice, it may be more convenient to 
.recmSro* ]f ' 25 rather thajl ( 12 - 26 ) because one can take advantage of the 


m evaluating (12.25). This makes computer programming a bit easier, 
for zero-truncated distributions there is no need to estimate the proba- 
f f ro be ^se it is known to be zero. The remaining parameters are 
estimated using the same formulas developed for the zero-modified distribu- 
tions. 

Example 12.58 The data set in Table 12.13 comes from Beard, et al. fl2l. 
Determine a model that adequately describes the data. 

When a Poisson distribution is fitted to it, the resulting fit is very poor. 
^J 00 much Probability for one accident and two Uttle at subsequent 
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Table 12.13 Fitted distributions to Beard data 


Accidents 

Observed 

Poisson 

Geometric 

ZM Poisson 

ZM geom. | 

0 

370,412 

369,246.9 

372,206.5 

370,412.0 

370,412.0 

1 

46,545 

48,643.6 

43,325.8 

46,432.1 

46,555.2 I 

2 

3,935 

3,204.1 

5,043.2 

4,138.6 

3,913.6 

3 

317 

140.7 

587.0 

245.9 

329.0 | 

4 

28 

4.6 

68.3 

11.0 

27.7 

5 

3 

0.1 

8.0 

0.4 

2.3 

6+ 

0 

0.0 

1.0 

0.0 

0.2 

Parameters 


A: 0.13174 

13: 0.13174 

0.87934 

p^: 0.87934 





A: 0.17827 

p ： 0.091780 

Loglikelihood 


-171,373 

一 171,479 

-171,160 

-171,133 


l = -nln(l + /?) + £n fe In 

OO 

=—nln(l + /3) + knk [In — ln(l 4- 0)] 

k=l 

=—n ln(l + /?) + n^[ln/3 — ln(l 4 - 0)\ 

= 一 (n + nx) ln(l + 0) + In jS ， 

where x = n^jn and n = Sfclo nfc * 

Differentiation reveals that the loglikelihood has a maximum at 

P = x. 

A qualitative look at the numbers indicates that the zero-modified geometric 
distribution matches the data better than the other three models considered. 
A formal analysis is done in Example 13.16. □ 

12.5.5 Compound models 

For the method of moments, the first few moments can be matched with the 
sample moments. The system of equations can be solved to obtain the moment 
based estimators. Note that the number of parameters in the compound 
model is the sum of the number of parameters in the primary and secondary 
distributions. The first two theoretical moments for compound distributions 
are 

E(5) = E{N)E(M) 

Var(5) = E{N) Var(M) + E(M) 2 Vax(iY). 


These results were developed in Chapter 6. The first three moments for the 
compound Poisson distribution are given in (4.27). 

Maximum likelihood estimation is also carried out as before. The loglike- 
lihood to be maximized is ^ 

I = 

fc=o 

When gk is the probability of a compound distribution, the loglikelihood can 
be maximized numerically. The first and second derivatives of the loglikeli¬ 
hood can be obtained by using approximate differentiation methods as applied 
directly to the loglikelihood function at the maximum value. 

Example 12.59 Determine various properties of the Poisson-zero-truncated 
geometric distribution. This distribution is also called the Polya-Aeppli dis¬ 
tribution. 


For the zero-truncated geometric distribution the pgf is 


P2(Z) 


1-(1 + /3)- 1 

and therefore the pgf of the Polya-Aeppli distribution is 


P^) = Pl[P2(z)] 




i-a+P )- 1 I 


The mean is 
tod the variance is 


p ， (i)=a(i + 灼 


P 〃⑴ + P ， ⑴- [ P ， ⑴ j 2 = A (1 + ^)(1 + 2^). 

Alternatively, E(N) = Var(iV) = A, E(M) = l + /3, and Var(M) = /3(1 + p). 
Then, 


E(5) = A(l + 用， 

Vax(5) = A/3(l + + A(1 + p) 2 = A(1 + /3)(1 + 2/3). 

Prom Theorem 4.51, the probability at zero is 
Po = Pi ⑼ = e—\ 

The successive values of gk are computed easily using the compound Poisson 
recursion 

\ 

9 k = - j ： Y ^ Jfjgk - j , A = 1，2,3,…， (12.27) 













Table 12.14 Automobile claims by year 


Year 

Exposure 

Claims 

1986 

2,145 

207 

1987 

2,452 

227 

1988 

3,112 

341 

1989 

3,458 

335 

1990 

3,698 

362 

1991 

3,872 

359 


where fj = ^^/(l + P) j , j = 1,2,.... For any values of A and /?, the 
loglikelihood function can be easily evaluated. □ 

Example 13.17 provides a data set for which the Polya-Aeppli distribution 
is a good choice. 

Another useful compound Poisson distribution is the Poisson-extended 
truncated negative binomial (Poisson-ETNB) distribution. Although, it does 
not matter if the secondary distribution is modified or truncated, we prefer the 
truncated version here so that the parameter r may be extended. 14 Special 
cases axe: r = 1, which is the Poisson-geometric (also called Polya-Aeppli); 
r ^ 0, which is the Poisson-logarithmic (negative binomial); and r = —0.5, 
which is called the Poisson-inverse Gaussian. This name is not consistent 
with the others. Here the inverse Gaussian distribution is a mixing distrib¬ 
ution (see Section 4.6.9). Example 13.18 provides a data set for which the 
Poisson-inverse Gaussian distribution is a good choice. 

12.5.6 Effect of exposure on maximum likelihood estimation 

In Section 4.6.11 the effect of exposure on discrete distributions was discussed. 
When aggregate data from a large group of insureds is obtained, maximum 
likelihood estimation is still possible. The following example illustrates this 
for the Poisson distribution. 

Example 12.60 Determine the maximum likelihood estimate of the Poisson 
parameter for the data in Table 12.14. 

Let A be the Poisson parameter for a single exposure. If year k has ex¬ 
posures, then the number of claims has a Poisson distribution with parameter 

This does not contradict Theorem 4.54. When —1 < r < 0, it is still the case that 
changing the probability at zero will not produce new distributions- What is true is that 
there is no probability at zero which will lead to an ordinary (a, 6,0) negative binomial 
secondary distribution. 
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^ (—efc + TifcA -1 ) = 0, 

= 0.09772. 


□ 

In this example the answer is what we expected it to be, the average number 
of nlflimc ； per exposure. This technique will work for any distribution in the 
(a, 6,0) 15 and compound classes. But care must be taken in the interpretation 
of the model. For example, if we use a negative binomial distribution, we are 
assuming that each exposure unit produces claims according to a negative 
binomial distribution. This is different from assuming that total claims have 
a negative binomial distribution because they arise from individuals who each 
have a Poisson distribution but with different parameters. 


j ： _ n k __ 1，831 


12.5.7 Exercises 

12.94 Assume that the binomial parameter m is known. Consider the maxi¬ 
mum likelihood estimator of q. 

(a) Show that the maximum likelihood estimator is unbiased. 

(b) Determine the variance of the maximum likelihood estimator. 

(c) Show that the asymptotic variance as given in Theorem 12.13 is 
the same as that developed in part (b). 

(d) Determine a simple formula for a confidence interval using (9.4) on 
page 276 that is based on replacing q with q in the variance term. 

(e) Determine a more complicated formula for a confidence interval 
using (9.3) that is not based on such a replacement. This should 
be done in a manner similar to that used in Example 11.12 on page 
309. 


15 For the binomial distribution, the usual problem that m must be an integer remains. 
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Table 12.15 Data for Exercise 12.96 


No. of claims 

No. of policies 

0 

9,048 

1 

905 

2 

45 

3 

2 

4+ 

0 


12.95 Use (12.18) to determine the maximum likelihood estimator of /? for the 
geometric distribution. In addition, determine the variance of the maximum 
likelihood estimator and verify that it matches the asymptotic variance as 
given in Theorem 12.13. 

12.96 A portfolio of 10,000 risks produced the claim counts in Table 12.15. 

(a) Determine the maximum likelihood estimate of A for a Poisson 
model and then determine a 95% confidence interval for A. 

(b) Determine the maximum likelihood estimate of ^ for a geometric 

. model and then determine a 95% confidence interval for p. 

(c) Determine the maximum likelihood estimate of r and /3 for a neg¬ 
ative binomial model. 

(d) Assume that m = 4. Determine the maximum, likelihood estimate 
of q of the binomial model. 

(e) Construct 95% confidence intervals for q using the methods devel¬ 
oped in parts ⑷ and ⑷ of Exercise 12.94. 

(f) Determine the maximum likelihood estimate of m and q by con¬ 
structing a likelihood profile. 

12.97 An automobile insurance policy provides benefits for accidents caused 
by both underinsured and uninsured motorists. Data on 1,000 policies re¬ 
vealed the information in Table 12.16. 

(a) Determine the maximum likelihood estimate of A for a Poisson 
model for each of the variables Ni = number of underinsured claims 
and N 2 = number of uninsured claims. 

(b) Assume that Ni and iV 2 are independent. Use Theorem 4.37 on 
page 74 to determine a model for N = Ni + iV 2 . 

12.98 An alternative method of obtaining a model for N in Exercise 12.97 
would be to record the total number of underinsured and uninsured claims 
for each of the 1,000 policies. Suppose this was done and the results were as 
in Table 12.17. 


Table 12.16 Data for Exercise 12.97 


No. of claims 

Underinsured 

Uninsured 

0 

901 

947 

1 

92 

50 

2 

5 

2 

3 

1 

1 

4 

1 

0 

5+ 

0 

0 


Table 12.17 Data for Exercise 12.98 


No. of claims 

No. of policies 

0 

861 

1 

121 

2 

13 

3 

3 

4 

1 

5 

0 

6 

1 

7+ 

0 


(a) Determine the maximum likelihood estimate of A for a Poisson 
model. 

(b) The answer to part (a) matched the answer to part (c) of the 
previous exercise. Demonstrate that this must always be so. 

(c) Determine the maximum likelihood estimate of (3 for a geometric 
model. 

(d) Determine the maximum likelihood estimate of r and /3 for a neg¬ 
ative binomial model. 

(e) Assume that m = 7. Determine the maximum likelihood estimate 
of q of the binomial model. 

(f) Determine the maximum likelihood estimates of m and q by con¬ 
structing a likelihood profile. 

12.99 The data in Table 12.18 represent the number of prescriptions filled in 
one year for a group of elderly members of a group insurance plan. 

(a) Determine the maximum likelihood estimate of A for a Poisson 
model. 

(b) Determine the maximum likelihood estimate of /? for a geometric 
model and then determine a 95% confidence interval for /?. 
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(c) Determine the ma.-xiTnnm likelihood estimates of r and ^ for a neg¬ 
ative binomial model. 

12.6 BIVARIATE MODELS 
12.6.1 Introduction 

At times a bivariate distribution with, dependent variables is the appropriate 
model. One such situation is a joint life annuity or insurance. Here the timing 
of the payments depends on the first or second death of two individuals. 
Because these individuals axe often related (typically spouses), the tim^ of 
death will be dependent. As another example, in casualty insurances it is 
common to record the expenses that are directly related to the payment of 
the loss, referred to as the allocated loss adjustment expenses (ALAE). The 
loss and the ALAE are usually strongly positively correlated. 

There are a variety of sources for bivariate and multivariate models. Among 
them are the books by Hutchinson and Lai ([64]), Kotz, Balakrislmaii, and 
Johnson ([80]), and Maxdia ([89]). However, most of the distributions have 
marginal distributions that axe not of interest for actuarial applications or 
have the parameters related in an unsuitable manner (for example, a bivariate 
gamma distribution in which both X and Y must have the same value for a). 
One exception is the bivariate lognormal distribution for which the logarithms 
of the two variables have a bivariate normal distribution. 

Of more interest and practical value are methods which construct bivariate 
models from known marginal distributions. For example, suppose it were 
known that losses have the Pareto distribution and that ALAE have the 
gamma distribution. Then those parameters could be estimated (and the 
models themselves determined) from the marginal data. Then, they could be 
combined into a bivariate distribution that introduces a degree of association, 
between the two variables. Among the methods available, the copula has re¬ 
ceived a lot of attention in the actuarial literature and is the only one that 
will be covered here. 


Table 12.19 Twenty-four losses with ALAE. 


Loss 


ALAE 


Loss 


ALAE 


1,500 

301 

11,750 

2,000 

3,043 

12,500 

2,500 

415 

14,000 

2,500 

4,940 

14,750 

4,500 

395 

15,000 

5,000 

25 

17,500 

5,750 

34,474 

19,833 

7,000 

50 

30,000 

7,000 

10,593 

33,033 

7,500 

50 

44,887 

9,000 

406 

62,500 

10,000 

1,174 

210,000 


2,530 

165 

175 

28,217 

2,072 

6,328 

212 

2,172 

7,845 

2,178 

12,251 

7,357 


12.6.2 Copulas 

Copula distributions axe created using a function, also called a copula. This 
fonctioii must itself be a legitimate bivariate distribution function over the 
unit square with uniform marginals. Denote the two marginal distribution 
functions Fx{ x ) aJid. Fy{y) and the copula function C 、 u ， v). The bivariate 
distribution function created by the three is then 

Fxx( x ^y) = C[Fx{x),F Y {y)]- 

A simple but fairly useless example is the copula C(u,v) = uv. This creates 
the bivariate distribution function Fx.vix.y) = F x {x)F Y (y), which is true 
for independent variables. 

A good general introduction is [42] and an introduction for actuaries can 
be found in [40]. The paper by Frees, Caxriere, and Valdez, [39] works with 
Prank’s copula for a study of joint lifetimes. An expanded version of the 
example presented here can be found in [78]. The last two cited papers show 
howto write the likelihood function under various truncations and censoring. 

Example 12.61 The loss and ALAE were recorded for each of 24 claims 
(Table 12.19). Determine a model for the joint distribution using Frank's 
copula with Pareto distributions for both marginals. 


Frank’s copula is (where log a means the logarithm base a) 
C(^) = lo g 


(12.28) 


where the parameter a controls the degree of association between the two 
variables. Values of a less than 1 indicate a positive association, values greater 


0 8 2 1 

4 3 5 9 


0 5 5 

6 - 21 - 26 - 36 - 

12 2 3 


8249474757 


1 - 34 - 67 - 111 - 





















uucubmg vamea lur biiG iour raxeto parameters were oDtainea by tmding the 
maximum likelihood estimates for the two marginal distributions. Simplex 
maximization yields the estimates a = 0.133024, 0 = 2.59889, 8 = 36,141.4, 
7 = 0.759943, and f = 803.839. The positive association is apparent and could 
be tested. One way is to use the likelihood ratio test discussed in Chapter 13. 
It turns out that with the small sample size there is not sufficient evidence to 
sure there is a positive association. □ 

A number of results concerning Prank’s copula can be found in the paper 
by Genest [41]. Two axe presented here. To simulate an (X, Y) pair，begin 
by simulating the X value from the marginal distribution. This can be done 
using the standard inversion technique. Follow this by simulating a value of 
Y from the conditional distribution of Y given X = x. To do this, first note 
that the distribution function is 

(d/dx)F(x,y) 

~~• 

For Prank’s copula we have 

i F ( X ， V) = /X(X) |I C ^n=F x{x ), v =F Y ( y ) 

= fx(x)a Fx ^[a FY ^ - 1] 

— a- 1 -f [a F xW - 1][^(2/) - 1]• 

To simulate a conditional value of Y using the inversion method discussed in 
Chapter 17, obtain a uiuforni(0,l) random number r and solve the equation 


for Fy(y) to obtain 
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a Fviy) = 1 + 


r(a — 1) 

a F ^^ x )(l-r) + r 


or ” 

一 U I r (o= - 1) 1 

L + a^W(l-r)+rJ ' 

The right-hand side is a number and then the distribution function of Y can 
be inverted to solve for the simulated value. 

The regression function can be found from 


^(2/) = ^ In 


E ( 寧 =x) = J [1- Fy| x ( 2 /|x)] dy 


. 1 ] 


- J ^ - a - 1]. 

but it is likely that the integral will have to be done numerically. 


dy, 


12.6.3 Exercise 

12.100 Consider the data set in Table 12.19. Fit a bivariate distribution using 
Frank 5 s copula where each marginal distribution has the inverse exponential 
distribution. 


12.7 MODELS WITH COVARIATES 
12.7.1 Introduction 

It may be that the distribution of the random variable of interest depends 
on certain characteristics of the underlying situation. For example, the dis¬ 
tribution of time to death may be related to the individual’s age, gender, 
smoking status, blood pressure, height, and weight* Or, consider the number 
of automobile accidents a vehicle has in a year. The distribution of this vari¬ 
able might be related to the number of miles it is driven, where it is driven, 
and various characteristics of the primary driver such as age, gender, marital 
status, and driving history. 

Example 12.62 Suppose we believe that the distribution of the number of 
accidents a driver has in a year is related to the driver’s age and gender. 
Provide three approaches to modeling this situation. 

Of course there is no limit to the number of models that could be considered. 
Three that might be used are given below. 




















BlSjjB^SnM^iKBaiEmSBul 






rametric model tor tins situation. As an 
dents for a given driver could be assumed 
tion with parameter A. The value of A is 
the age x and the gender ( 分 =1 for males, 
ay such as 

ko + airc + a 2 x 2 )P 9 . 



y, distribution, or hazard rate function 
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12.7.2 Proportional hazards models 

A particular model that is relatively easy to work with is the Cox proportional 
hazards model. 

Definition 12.63 Given a baseline hazard rate function ho(t) and values 
zi^...^z p associated with a particular individual，the Cox proportional haz¬ 
ards model for that verson is aiven bv the hazard rate function 


h{x\z) = h^xy^zx + … + 0 p z p ) = h 0 {x)c{fi 

ere c(y) is any function that takes on only positive values, z 
2 column vector of the z values (called covariates), and (3 
a column vector of coefficients. 













□ 
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Table 12.20 Fire insurance payments 


Zl 

Z2 

Payment 

10 

0 

70 

20 

0 

22 

30 

0 

90* 

40 

0 

81 

50 

0 

8 

10 

1 

51 

20 

1 

95* 

30 

1 

55 

40 

1 

85* 

50 

1 

93 


*The payment was made at the policy limit. 


parametric. In the spirit of this text, we will use maximum likelihood for 
estimation of j3 1 and /3 2 - We will begin with a fully parametric example. 

Example 12.65 For the fire insurance example^ 10 payments are in Table 
12.20. All values are expressed as a percentage of the house’s value. Estimate 
the parameters of the Cox proportional hazards model using maximum likeli¬ 
hood and both an exponential and a beta distribution for the baseline hazard 
rate function. There is no deductible on these policies，but there is a policy 
limit (which differs by policy). 

In order to construct the likelihood function, we need the density and 
survival functions. Let Cj = exp(p> T z) be the Cox multiplier for the jth. 
observation. Then, as noted in the previous example, Sj(x) = So(af) Cj ， where 
So(x) is the baseline distribution. The density function is 

綱 =-S' j (x) = -c j S 0 (xp- 1 S' 0 {x) 

=Cj5 0 (a;) Ci_1 /o(^)- 

For the exponential distribution, 

Sj{x) = [ e - x / 0 ] Ci = e— c P/ 0 and fj(x)= ( 警 ) e~ c ^ 9 

and for the beta distribution ， 

Sj(x) = [1 一 /3(a, 6; x)] Cj , and 

m = Ci [l - 0(a, 6; - x f-\ 

where (3{a^b\x) is the distribution function for a beta distribution with pa¬ 
rameters a and b [available in Excel® as BETADIST(x,a,b)]. The gamma 
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function is available in Excel® as EXP(GAMMALN(a)). For policies with 
payments not at the limit, the contribution to the likelihood function is the 
density function while for those paid at the limit it is the survival function. 
In both cases, the likelihood function is sufficiently complex that it is not 
worth writing out. The parameter estimates for the exponential model axe 
^ = 0.00319, = 一 0.63722, and 0 = 0.74041. The value of the logarithm 

of the likelihood - function is -6.1379. For the beta model, the estimates axe 
^ = -0.00315, = 一 0.77847, a = 1.03706, and b = 0.81442. The value 

of the logarithm, of the likelihood function is —4.2155. Using the Schwarz 
Bayesian criterion (see Section 13.5.3), an improvement of ln(10)/2 = 1.1513 
is needed to justify a fourth parameter. The beta model is preferred. If an 
estimate of the information matrix is desired, the only reasonable strategy is 
to take numerical derivatives of the loglikelihood. 口 

An alternative is to construct a data-dependent model for the baseline 
hazard rate. Let R{yj) be the set of observations that are in the risk set for 
uncensored observation yj} 1 Rather than obtain the true likelihood value, it 
is easier to obtain what is called the partial likelihood value* It is a conditional 
value. Rather than asking, <r Wliat is the probability of observing a value of 
yj T' we ask, “Given that it is known there is an uncensored observation of 
Vjy what is the probability that it was the policy that had that value? Do 
this conditioned on equalling or exceeding that value.” This method allows 
us to estimate the /3 coefficients separately from the baseline hazard rate. 
Notation can become a bit awkward here. Let j* identify the observation 
that produced the uncensored observation of yj. Then the contribution to the 
likelihood function for that policy is 

fAyjVH) = Cj-fo(yj)/So{yj) = c r 
YlieR(yj) ^)/^(%*) ^fi€R{yj) ^foiy^/^oiyj) YlieR{yj) ^ 
Example 12.66 Use the partial likelihood to estimate 13 1 and (3^ • 

The ordered, uncensored values are 8, 22, 51, 55, 70, 81, and 93. The 
calculation of the contribution to the likelihood function is in Table 12.21. 

The product is maximized when j&i = —0.00373 and P 2 ^ —0.91994 and 
the logarithm of the partial likelihood is —11.9889. When. 0i is forced to 
be zero, the maxiniuin is at ^2 = —0.93708 and the logarithm of the partial 
likelihood is 一 11.9968. There is no evidence in this sample that age of the 
house has an impact when using this model. 口 

Three issues remain. One is to estimate the baseline hazard rate function, 
one is to deal with the case where there are multiple observations at the same 

17 Recall from Section 11.1 that yi ， y2”.. represent the ordered, unique values from the set 
of uncensored observations. The risk set was also defined in that section. 
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Table 12.21 Fire insurance likelihood 


Value 

y 

c 

Contribution to L 

8 

8 

Cl = exp(50^ 1 ) 

Cl 

Ci+ … +CU) 

22 

22 

C 2 = exp(20^ x ) 

Cl 

Co-i - hcio 

51 

51 

c 3 = exp(10^ 1 + /3 2 ) 

ca 

C 3 ~i - hcio 

55 

55 

c 4 = exp(30/?i + /? 2 ) 

C 4 H - f-Cio 

70 

70 

c 5 = expClO/^i) 

Cs 

81 

81 

c 6 = exp(40^i) 

Cb 

C 6 H - hcio 

85 


c 7 = expf^! + P 2 ) 


90 


c 8 = exp(30/3 1 ) 


93 

93 

c 9 =exp(50^ 1 + /3 2 ) 

_ 

C9+C10 

95 


ci O = exp(20/3 1 +y9 2 ) 



value, and the final one is to estimate the variances of estimators. For the 
second problem, there axe a number of approaches in the literature. The 
question raised earlier could be rephrased as “Given that it is known there 
axe Sj uncensored observations of yj, what is the probability that it was the 
Sj policies that actually had that value? Do this conditioned on equalling 
or exceeding that value •” A direct interpretation of this statement would 
have the numerator reflect the probability of the Sj observations that were 


each of the Sj observations separately but, for the denominator, uses the same 
risk set for ail of them. The effect is to require no change from the algorithm 
introduced above. 

Example 12.67 In the previous example, suppose that the observation of 81 
had actually been 70. Give the contribution to the partial likelihood function 
for these two observations. 

Using the notation from that example, the contribution for the first obser¬ 
vation of 70 would still be 05 /( 05 -! - hcio). However, the second observation 

of 70 would now contribute cq/{c^ H - h ci。). Note that the numerator has 

not changed (it is still cq)] however, the denominator reflects the fact that 
there are six observations in i?(70). □ 
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Table 12.22 Fire insurance baseline survival function 


Value 

y 

c 

Jump 


Mv) 

§o(y) 

8 

8 

0.8300 

1 

0.8300+ … +0.3699 

= 0.1597 

0.1597 

0.8524 

22 

22 

0.9282 

1 

0.9282 十 … +0.3699 

= 0.1841 

0.3438 

0.7091 

51 

51 

0.3840 

1 

0.3840+ … +0.3699 

= 0.2220 

0.5658 

0.5679 

55 

55 

0.3564 

1 

0.3564+...+0.3699 

= 0.2427 

0.8086 

0.4455 

70 

70 

0.9634 

1 

0.9634+ … +0.3699 

= 0.2657 

1.0743 

0.3415 

81 

81 

0.8615 

1 

0.86154 •… +0.3699 

= 0.3572 

1.4315 

0.2390 

85 


0.3434 





90 


0.8942 





93 

93 

0.3308 

0.3308+0.3699 = 

1.4271 

2.8586 

0.0574 

95 


0.3699 






To employ an analog of the Nelson-Aalen estimate, we use 


H 0 (t) = 

Vj<t 



That is, the outer sum is taken over all uncensored observations less than or 


the risk set, adds their c values. As usual, the baseline survival function is 
estimated as 5 0 (t) = exp [— *§o(t)]. 


Example 12.68 For. the continuing example (using the original values) y es¬ 
timate the baseline survival function and then estimate the probability that 
a claim for a 35-year-old wood house will exceed 80% of the house’s value. 
Compare this to the value obtained from the beta distribution model obtained 
earlier. 


Using the estimates obtained earlier, the 10 c values are as given in Table 
12.22. Also included is the jump in the cumulative hazard estimate, followed 
by the estimate of the cumulative hazard function itself. Values for that 















Example 12.69 Obtain the information matrix and estimated covariance 
matrix for the continuing example. Then use this to produce a 95% confi¬ 
dence interval for the relative risk of a wood house versus a brick house of the 
same age. 

Consider the entry in the outer sum for the observation with z\ = 50 
and Z 2 = 1. The risk set contains this observation (with a value of 93 and 
c = 0.330802) and the censored observation with z\ = 20 and 句 =1 (with a 
value of 95 and c = 0.369924). For the derivative with respect to P 1 and /3 2 
the entry is 

50(1)(0.330802) + 20(1)(0.369924) 

0.330802 + 0.369924 

[50(0.330802) + 20(0.369924)] [1(0.330802) +1(0.369924)] 

(0.330802+ 0.369924) 2 = * 


•elative risk is the ratio of the c values for the two cases. For a house of 
,the relative risk of wood versus brick is e Zl ^^ /e zi/?1 = e^ 2 . A 95% 
lence interval for /3 2 is -0.91994± 1.96\/0.774125 or ( 一 2.6444,0.80455). 
nentiating the endpoints gives the confidence interval for the relative 
(0.07105,2.2357). □ 

3 The generalized linear and accelerated failure time models 

proportional hazards model requires a particular relationship between 


ibution, a model not suitable for most phenomena of mt 彳 
te generalized linear model drops that restriction and so 
A comprehensive reference is [90] and actuarial papers i 
ie [47], [61], [93], and [97]. The definition of this model ^ 
more general than the usual one. 

70 Suppose a parametric distribution has parameters fi 
e mean and 0 is a vector of additional parameters. Let it 
'he mean must not depend on the additional parameters 


lualj let p be a vector of coefficients, and 
generalized linear model then states that 
listribution function 

F{x\z,0) = F(x\/j,,e), 


that the mean of an individual observation 













distributions, it has been possible to develop the full set of regression tools, 
such as residual analysis. Computer packages that implement the generalized 
linear model use only these distributions. 

For many of the distributions we have been using, the mean is not a para¬ 
meter. However, it could be. For example, we could parameterize the Pareto 
distribution by setting fi = 0 / (a—1) or, equivalently, replacing 9 with /x(a—1). 
The distribution function is now 

外 — = 1-[ #: 珠 ] ， „>Q,a>l. 

Note the restriction on a in the parameter space. 

Example 12.72 Construct a generalized linear model for the data set in Ex¬ 
ample 12.65 using a beta distribution for the loss model. 

The beta distribution as parameterized in Appendix A has a mean of fi = 
a/(a+b). Let the other parameter be 0 = b. One way of linking the covaxiates 
to the mean is to use rj(fi) = 芦 /(1 一 〆） and c(/3 T z) = exp(/3 T z). Setting these 
equal and solving yields 

exp(^ r z) 

1 + exp^z) 

Solving the first two equations yields a = = b exp(/3 T z). Maximum 

likelihood estimation proceeds by using a factor of f(x) for each uncensored 
observation and 1 — F(x) for each censored observation. For each observation, 
the beta distribution uses the parameter b directly, and the parameter a from 
the value of b and the covaxiates for that observation. Because there is no 
baseline distribution, the expression /3 T z must include a constant term. Max¬ 
imizing the likelihood function yields the estimates b = 0.5775, P Q = 0.2130, 
i&i = 0.0018, and = 1*0940. As in Example 12.65, the impact of age is 
negligible. One advantage of this model is that the mean is directly linked to 
the covaxiates. □ 

A model that is similar in spirit to the generalized linear model is the 
accelerated failure time model as described below. 

Definition 12.73 The accelerated failure time model is defined from 


S(x\z,fi) = Sq{ 


luc puruirLtiLC 

Gompertz distribution. 
The Gompertz dist 









pressure. For an individual insured, 
ted failure time model implies that 


























If there are d x deaths at age a:, the contribution to 
(where a binomial distribution has been assumed fo 

d x ]nq x + (1000 — d x ) ln(l - 




The likelihood function is maximi 
1110, and /3 2 = - 0.0144. Being femj 
birth) by a factor of exp(O.HO) = 1. 
lowers the expected lifetime by 1 — 
decrease. 




0.000243, c = 1.0 
s the expected life 








struct each or the toilowing three models: 


Use four different Nelson-Aalen estimates, keeping the four groups 
separate. 

Use a proportional hazards model where the baseline distribution 
has the exponential distribution. 

Use a proportional hazards model with an empirical estimate of 
the baseline distribution. 


s ) The duration of a strike follows a Cox proportional hazards model 












13 

Model selection 


13.1 INTRODUCTION 

When using data to build a model, the process must end with the announce¬ 
ment of a <l wiDiier.” While qualifications, limitations, caveats, and other 
attempts to escape fall responsibility are appropriate, and often necessary, a 
commitment to a> solution is often required. In this chapter we look at a vaxi- 


whatever model we select it is only an approximation 
cted in the following modeler’s motto 1 : 


of 


All models are wrong, but some models are useful 


Thus, our goal is to determine a model that is good enough to use to 
answer the question. The challenge here is that the definition of good enough 
will depend on the particular application. Another important modeling point 
is that a solid understanding of the question will guide you to the answer. 
The following quote from John Tukey [131], pp. 13-14 sums this up: 


Far better an approximate answer to the right question, which is often 
vague, than an exact answer to the wrong question, which, can always 
be made precise. 


ilt is usually attributed to George Box. 

Loss Models: From Data to Decisions，Second Edition. 

By Stuart A. Kingman, Harry H. Panjer, and Gordon E. Willmot 
ISBN 0-4T1-21577-5 Copyright © 2004 John Wiley & Sons, Inc. 




the truncation point in the data set be t. The modified functions are 


a variety of evaluation and selection tools. Each tool has its own strengths 
and weaknesses, and it is possible for different tools to lead to different mod¬ 
els. This makes modeling as much art as science. At times, in real-world 
applications, the model’s purpose may lead the analyst to favor one tool over 
another. 


13.2 REPRESENTATIONS OF THE DATA AND MODEL 

All the approaches to be presented attempt to compare the proposed model 
to the data or to another model. The proposed model is represented by¬ 


quantities 




function. 




complete d£ 
censored, di 









f o, X<t, 

F*{x) - { F(x)-F(t) 

i 1-F(t) 1 

f o, X<t, 

作卜 -»_, ,>t. 

{ i-m - 

In this chapter, when a distribution function or density function is indi¬ 
cated, a subscript equal to the sample size indicates that it is the empirical 
model (from Kaplan-Meier, Nelson-Aalen, the ogive, etc.) while no adorn¬ 
ment or the use of an asterisk (*), indicates the estimated parametric model. 
There is no notation for the true, underlying distribution because it is un- 
Vnnwn and nnlcnowable. 










7,500. Estimate the parameter of an exponential model for each data set Plot 
the appropriate functions and comment on the quality of the fit of the model 
Repeat this for Data Set B censored at 1,000 (without any truncation). 

















































plot D(x) = F n {x) — F*(x). 

Example 13.2 Plot D{x) for the previous example. 


For Data Set B truncated at 50, the plot appears in Figure 13.4. The lack 
of fit for this model is magnified in this plot. 

There is no corresponding plot for grouped data. For Data Set B censored 
at 1,000, the plot must again end at that value. It appears in Figure 13.5. 
The lack of fit continues to be apparent. 口 

Another way to highlight any differences is the p-p plot, wMch is also 
called a probability plot. The plot is created by ordering the observations as 
x-l < < x n . A point is then plotted corresponding to each value. The 

coordinate! to plot are (K ㈣ ，尸 fe)). 4 If the model fits weU, the plotted 


bly called a q-q 














x = 82. The empirical value is F n (82) = * = 0.05. The other coordinate—is— 

F*(82) = 1 - e -(82-50)/802.32 = 0 ⑽礼 

One of the plotted points will be (0.05,0.0391). The complete picture appears 
in Figure 13.6. 

Prom the lower left part of the plot it is clear that the exponential model 
places less probability on small values than the data ca]l for. A “iW p i ot 
^ n 7 be constructed for Data Set B censored at 1,000 and it appears in Figure 

This plot ends at about 0.75 because that is the highest probability ob- 
served prior to the censoring point at 1,000. There axe no empirical values 
at higher probabilities. Again, the exponential model tends to underestimate 
the empirical values. q 

13.3.1 Exercises 


13.1 Repeat Example 13.1 using a Weibull model in place of the exponential 
model. 


A picture may be worth many words, but sometimes it is best to replace the 
impressions conveyed by pictures with mathematical demonstrations. One 
such demonstration is a test of the hypotheses 

Hq : The data came from a population with the stated model. 

Hi : The data did not come from such a population. 

The test statistic is usually a measure of how close the model distribution 
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Type II error while lowering the probability of a Type I error. 5 For actuarial 
modeling this is likely to be an acceptable trade-off. 

One method of avoiding the approximation is to randomly divide the sam¬ 
ple in half. Use one half to estimate the parameters and then use the other 
half to conduct the hypothesis test. Once the model is selected, the full data 
set could be used to reestimate the parameters. 

13.4.1 Kolmogorov — Smirnov test 

Let t be the left truncation point (t = 0 if there is no truncation) and let u 
be the right censoring point (w = oo if there is no censoring). Then, the test 
statistic is 

D = t ^ u Wn{x) - F*(x)\. 

This test should only be used on individual data. This is to ensure that 
the step function F n (x) is well defined. Also, the model distribution function 
F*(x) is assumed to be continuous over the relevant range. 

Example 13.4 Calculate D for Example 13.1. 

Table 13.3 provides the needed values. Because the empirical distribution 
function jumps at each data point, the model distribution function must be 
compared both before and after the jump. The values just before the jump 
are denoted F n (x~) in the table. The maximum is D = 0.1340. 

For Data Set B censored at 1,000, 15 of the 20 observations are uncensored. 
Table 13.4 illustrates the needed calculations. The maximum isD = 0.0991. □ 

AU that remains is to determine the critical value. Commonly used critical 
values for this test are 122/y/n for a = 0.10, 1.36/Vn for a = 0.05, and 
1.63 / y/n for a = 0.01. When u < oo, the critical value should be smaller 
because there is less opportunity for the difference to become large. Modifica¬ 
tions for this phenomenon exist in the literature (see [125], for example, which 
also includes tables of critical values for specific null distribution models), and 
one such modification is given in [109] but will not be introduced here. 

Example 13.5 Complete the Kolmogorov - STnimov test for the previous ex- 
ample. 

For Data Set B truncated at 50 the sample size is 19. The critical value at 
a 5% significance level is 1.36/\/l9 = 0.3120. Because 0.1340 < 0.3120, the 
null hypothesis is not rejected and the exponential distribution is a plausible 

5 Among the tests presented here, only the chi-square test has a built-in correction for this 
situation. Alodifications for the other tests have been developed, but they will not be 
presented here. 
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Table 13.3 Calculation of D for Example 13.4 


X 

F*{x) 

Fn(x~) 

Fn{x) 

Maximum 

difference 

82 

0.0391 

0.0000 

0.0526 

0.0391 

115 

0.0778 

0.0526 

0.1053 

0.0275 

126 

0.0904 

0.1053 

0.1579 

0.0675 

155 

0.1227 

0.1579 

0.2105 

0.0878 

161 

0.1292 

0.2105 

0.2632 

0.1340 

243 

0.2138 

0.2632 

0.3158 

0.1020 

294 

0.2622 

0.3158 

0.3684 

0.1062 

340 

0.3033 

0.3684 

0.4211 

0.1178 

384 

0.3405 

0.4211 

0.4737 

0.1332 

457 

0.3979 

0.4737 

0.5263 

0.1284 

680 

0.5440 

0.5263 

0.5789 

0.0349 

855 

0.6333 

0.5789 

0.6316 

0.0544 

877 

0.6433 

0.6316 

0.6842 

0.0409 

974 

0.6839 

0.6842 

0.7368 

0.0529 

1,193 

0.7594 

0.7368 

0.7895 

0.0301 

1,340 

0.7997 

0.7895 

0.8421 

0.0424 

1，884 

0.8983 

0.8421 

0.8947 

0.0562 

2,558 

0.9561 

0.8947 

0.9474 

0.0614 

3,476 

0.9860 

0.9474 

1-0000 

0.0386 


model. While it is unlikely that the exponential model is appropriate for 
this population, the sample size is too small to lead to that conclusion. For 
Data Set B censored at 1,000 the sample size is 20 and so the critical value 
is 1.36/V^O = 0.3041 and the exponential model is again viewed as being 
plausible. □ 

For both this test and the Anderson - Darling test that follows, the criti¬ 
cal values are correct only when the null hypothesis completely specifies the 
model. When the data set is used to estimate parameters for the null hypoth¬ 

esized distribution (as in the example), the correct critical value is smaller. 
For both, tests, the change depends on the particular distribution that is hy¬ 

pothesized and maybe even on the particular true values of the parameters. 
An indication of how simulation can be used for this situation is presented in 
Section 17.2.4. 









0.0369 

0.00 

0.1079 

0.05 

0.1480 

0.10 

0.1610 

0.15 

0.1942 

0.20 

0.2009 

0.25 

0.2871 

0.30 

0.3360 

0.35 

0.3772 

0.40 

0.4142 

0.45 

0.4709 

0.50 

0.6121 

0.55 

0.6960 

0.60 

0.7052 

0.65 

0.7425 

0.70 

0.7516 

0.75 


0.05 

0.0369 

0.10 

0.0579 

0.15 

0.0480 

0.20 

0.0390 

0.25 

0.0558 

0.30 

0.0991 

0.35 

0.0629 

0.40 

0.0640 

0.45 

0.0728 

0.50 

0.0858 

0.55 

0.0791 

0.60 

0.0621 

0.65 

0.0960 

0.70 

0.0552 

0.75 

0.0425 

0.75 

0.0016 


13.4.2 Anderson - Darling test 

This test is si mil ar to the Kolmogorov—Smirnov test, but uses a different 
measure of the difference between the two distribution functions. The test 
statistic is 

A 2 =n 

That is, it is a weighted average of the squared differences between the em¬ 
pirical and model distribution functions. Note that when x is close to t or to 
u the weights might be very large due to the small value of one of the factors 

in the denominator. This test statistic tends to place more emphasis on good 

fit in the tails than in the middle of the distribution. Calculating with this 
formula appears to be challenging. However, for individual data (so this is 
another test that does not work for grouped data), the integral simplifies to 
k 

A 2 = -nF*(u) +nJ2[l~ 凡 ( 沁 )] 2 {111[1 - F*( yj )]-hi[l ~ F*(y j+1 )]} 
i=o 
k 

+nJ2Fn( yj ) 2 lhxF*(y j+1 )-]nF*( yj )], 

J=1 

where the unique noncensored data points axe t = yo < yi < ••• < yk < 
yk+i = u. Note that when u = oo the last term of the first sum is zero 


[F n {x)-F*{x)f 

: F(x)[l — 俨 ⑷] 


f*(x)dx. 
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Table 13.4 Calculation of D for Example 13.4 with censoring 

Maximum 

a; F*(x) F n (x—) F n (x) difference 
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Table 13.5 Anderson-Darling test for Example 13.6 


3 

Vj 


F n (x) 

Summand 

0 

50 

0.0000 

0.0000 

0.0399 

1 

82 

0.0391 

0.0526 

0.0388 

2 

115 

0.0778 

0.1053 

0.0126 

3 

126 

0.0904 

0.1579 

0.0332 

4 

155 

0.1227 

0.2105 

0.0070 

5 

161 

0.1292 

0.2632 

0.0904 

6 

243 

0.2138 

0.3158 

0.0501 

7 

294 

0.2622 

0.3684 

0.0426 

8 

340 

0.3033 

0.4211 

0.0389 

9 

384 

0.3405 

0.4737 

0.0601 

10 

457 

0.3979 

0.5263 

0.1490 

11 

680 

0.5440 

0.5789 

0.0897 

12 

855 

0.6333 

0.6316 

0.0099 

13 

877 

0.6433 

0.6842 

0.0407 

14 

974 

0.6839 

0.7368 

0.0758 

15 

1,193 

0.7594 

0.7895 

0.0403 

16 

1,340 

0.7997 

0.8421 

0.0994 

17 

1,884 

0.8983 

0.8947 

0.0592 

18 

2,558 

0.9561 

0.9474 

0.0308 

19 

3,476 

0.9860 

1.0000 

0.0141 

20 

oo 

1.0000 

1.0000 



[evaluating the formula as written will ask for ln(0)]. The critical values are 
1-.933, 2.492, and 3.857 for 10, 5, and 1% significance levels, respectively. As 
with the Kolmogorov-Smirnov test, the critical value should be smaller when 
u<oo. 

Example 13.6 Perform the Anderson-Darling test for the continuing exam¬ 
ple. 

For Data Set B truncated at 50, there axe 19 data points. The calculation 

is in Table 13.5, where “summand” refers to the sum of the corresponding 

terms from the two sums. The total is 1.0226 and the test statistic is —19(1) + 
19(1.0226) = 0.4292. Because the test statistic is less than the critical value 
of 2.492, the exponential model is viewed as plausible. 

For Data Set B censored at 1,000, the results axe in Table 13.6. The total 
is 0.7602 and the test statistic is -20(0.7516) + 20(0.7602) = 0.1713. Because 
the test statistic does not exceed the critical value of 2.492, the exponential 
model is viewed as plausible. □ 


7256513404705740 

2812564948585770 

11112233468890 







Range 


50-150 

150-250 

250-500 

500-1,000 

1,000-2,000 

2,000-oo 


be at least 5. Som( 
agree the test work 
If the data are gro 
although adjacent 
data, the data can 

Example 13.7 Pt 

distribution for the 

All three data s 
cated at 50, establi 
The calculations a] 
degrees of freedom 


from term to term. 
le groups as given, 
Ei. For individual 


with the Excel 四 function CHIINV(.05,4)) and the p-value is 0.8436 [from 
CHIDIST(1.4034 J 4)] - The exponential model is a good fit. 

For Data Set B censored at 1,000, the first interval is from 0-150 and the 
last interval is from l,000-oo. Unlike the previous two tests, the censored 
observations can be used. The calculations axe in Table 13.8. The total is 
X 2 = 0.5951. With three degrees of freedom (5 rows minus 1 minus 1 estimated 
parameter) the critical value for a test at a 5% significance level is 7.8147 and 
the p-value is 0.8976. The exponential model is a good fit. 

For Data Set C the groups are already in place. The results axe given Table 
1.91 ； 

' le is about 10 一 There is clear evidence 

appropriate. A more accurate test would 
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Table 13.8 Data Set B censored at 1, ⑻ 0 


Range 

P 

Expected 

Observed 

.X 2 

0-150 

0.1885 

3.771 

4 

0.0139 

150-250 

0.1055 

2.110 

3 

0.3754 

250-500 

0.2076 

4.152 

4 

0.0055 

500-1,000 

0.2500 

5.000 

4 

0.2000 

1,000-cx) 

0.2484 

4.968 

5 

0.0002 

Total 

1 

20 

20 

0.5951 
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Table 13.10 Automobile claims by year 


Year 

Exposure 

Claims 

1986 

2,145 

207 

1987 

2,452 

227 

1988 

3,112 

341 

1989 

3,458 

335 

1990 

3,698 

362 

1991 

3,872 

359 


Table 13.9 Data Set C 


Range 

p 

Expected 

Observed 

x 2 

7,500-17,500 

0.2023 

25.889 

42 

10.026 

17,500-32,500 

0.2293 

29.356 

29 

0.004 

32,500-67,500 

0.3107 

39.765 

28 

3.481 

67,500-125,000 

0.1874 

23.993 

17 

2.038 

125,000-300,000. 

0.0689 

8.824 

9 

0.003 

300,000-oo 

0.0013 

0.172 

3 

46.360 

Total 

1 

128 

128 

61.913 


combine the last two groups (because the expected count in the last group is 
less than 1). The group from 125,000 to infinity has an expected count of 8.997 
and an observed count of 12 for a contribution of 1.002. The test statistic is 
now 16.552 and with three degrees of freedom the p-valueis 0.00087. The test 
continues to reject the exponential model. □ 


unit. The test statistic is then 

▽ i n k — Ek) 2 

and has an approximate chi-square distribution with degrees of freedom equal 
to the munber of data points less the number of estimated paxameters. The 
expected count is Ek = Aefc and the variance is T4 = Ae/t also. The test 
statistic is 

(207 - 209.61) 2 (227 - 239.61) 2 r (341-304.il) 2 

Q = 209.61 + 239.61 + 304.11 

(335 - 337.92) 2 (362 — 361.37) 2 (359 - 378.38) 2 

+ 337^92 + 361.37 • 378.38 

= 6.19. 


With j&ve degrees of freedom, the 5% critical value is 11.07 and the Poisson 
hypothesis is accepted. □ 


Sometimes, the test can be modified to fit different situations. The follow¬ 
ing example illustrates this for aggregate frequency data. 

Example 13.8 Conduct an approximate goodness-of-fit test for the Poisson 
model determined in Example 12.60. The data are repeated in Table 13.10. 

For each year we are assuming that the number of claims is the result of 
the sum of a number (given by the exposure) of independent and identical 
random variables. In . that case the central limit theorem indicates that a 
normal approximation may be appropriate. The expected count (Ek) is the 
exposure times the estimated expected value for one exposure unit while the 

variance (T4) is the exposure times the estimated variance for one exposure 


There is one important point to note about these tests. Suppose the sample 
size were to double but sampled values were not much different (imagine each 
number showing up twice instead of once). For the Kolmogorov - Smirnov 
test, the test statistic would be unchanged, but the critical value would be 
smaller. For the Anderson - Darling and chi-square tests, the test statistic 
would double while the critical value would be unchanged. As a result, for 
larger sample sizes, it is more likely that the null hypothesis (and thus the 
proposed model) will be rejected. This should not be surprising. We know that 
the null hypothesis is false (it is extremely unlikely that a simple distribution 
using a few parameters can explain the complex behavior that produced the 
observations) and with a large enough sample size we will have convincing 
evidence of that truth. When using these tests we must remember that, 
although all our models axe wrong, some may be useful. 
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8.77 

0.0125 

4.92 

0.1779 

4.54 

0.2091 

4.65 

0.0979 

1.96 

0.3754 


0.2525 


Negative binomial 2 



Poisson-inverse Gaussian 2 

ZM negative binomial 3 

Geometric-negative binomial 3 
Poisson-ETNB 3 


The test statistic is T = 2( — 162.293 + 162.466) = 0.346. For a chi-square 
distribution with one degree of freedom, the critical value is 3.8415. Because 
0.346 < 3.8415, the null hypothesis is not rejected. The probability that a 
chi-square random variable with one degree of freedom exceeds 0.346 is 0.556, 
a p-value that indicates little support for the alternative hypothesis. □ 

Example 13.11 (Example 4.42 continued) Members of the (a, 6,0) class were 
not sufficient to describe these data. Determine a suitable model 

Thirteen different distributions were fit to the data. The results of that 
process revealed six models with, p-values above 0.01 for the chi-square good- 
ness-of~fit test. Information about those models is given in Table 13.11. The 
likelihood ratio test indicates that the three-parameter model with the small¬ 
est negative loglikelihood (Poisson-ETNB) is not significantly better than the 
two-paxameter Poisson—inverse Gaussian model. The latter appears to be an 
excellent choice. □ 

Example 13.12 The estimated value of ^ in Example 12.65 is small. Per¬ 
form a likelihood ratio test using the beta model to see if age has a significant 
impact on losses. 

The parameters are reestimated forcing /3 X to be zero. When this is done, 
the estimates are 為 2 = —0.79193, a = 1.03118, and b = 0.74249. The value of 
the logarithm of the likelihood function is 一 4.2209. Adding age improves the 
likelihood by 0.0054, which is not significant. □ 

It is tempting to use this test when the alternative distribution simply has 
more parameters than the null distribution. In such cases the test is not 
appropriate. For example, it is possible for a two-parameter lognormal model 
to have a higher loglikelihood value than a thxee-paxameter Burr model TMs 
produces a negative test statistic, indicating that a chi-square distribution is 
not appropriate. When the null distribution is a limiting (rather than special) 
case of the alternative distribution, the test may still be used, but the test 


Table 13.11 Six useful models for Example 13.11 


Model 


Number of Negative 
parameters loglikelihood x 2 


p-value 


4 9 1 2 p 1 
0 7 5 6 7 5 
48.43.43.43.42.42. 
^ 












cases, provided it is clearly understood that a formal hypothesis test was. not 
conducted. Further examples and exercises using this test to make decisions 
appear in both the next section and the next chapter. 


13.4.5 Exercises 

13.4 Use the Kolmogorov—Smirnov test to see if a Weibull model is appro¬ 
priate for the data used in Example 13.5. 

13.5 (*) Five observations are made from a random variable. They are 1, 2, 
3 ， 5, and 13. Determine the value of the Kolmogorov-Smirnov test statistic 
for the null hypothesis that /(x) = 2x~~ 2 e~ 2 / x , a; > 0. 

13.6 (*) You are given the following five observations from a random sample: 
0.1 } 0.2, 0.5,1.0, and 1.3. Calculate the Kolmogorov-Smirnov test statistic for 
the null hypothesis that the population density function is f(x) = 2(1 + x) 一 3 , 
a: > 0. 

13.7 Perform the Anderson-Darling test of the Weibull distribution for Ex¬ 
ample 13.6. 

13.8 Repeat Example 13.7 for the Weibull model. 

13.9 (*) One hundred and fifty policyholders were observed from the time 
they arranged a viatical settlement until their death. No observations were 
censored. There were 21 deaths in the first year, 27 deaths in the second year, 
39 deaths in the third year, and 63 deaths in the fourtli year. The survival 
model 

is being considered. At a 5% significance level, conduct the chi-square goodness- 
of-fit test, ' 

13.10 (*) Each day, for 365 days, the number of claims is recorded. The 
results were 50 days with no claims, 122 days with one claim, 101 days with 
two claims, 92 days with three claims, and no days with four or more claims. 
For a Poisson model determine the maximum likelihood estimate of A and 
then perform the chi-square goodness-of-fit test at a 2.5% significance level. 
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Table 13.12 Data for Exercise 13.11 


No. of accidents 

Days 

0 

209 

1 

111 

2 

33 

3 

7 

4 

3 

5 

2 


13.12 Redo Example 13.8 assuming that each exposure unit has a geometric 
distribution. Conduct the approximate chi-square goodness-of-fit test. Is the 
geometric preferable to the Poisson model? 

13.13 Using Data Set B (with the original largest value), determine if a 
gamma model is more appropriate than an exponential model. Recall that an 
exponential model is a gamma model with a = 1. Useful values were obtained 
in Example 12.8. 

13.14 Use Data Set C to choose a model for the population that produced 
those numbers. Choose from the exponential, gamma, and transformed gamma 
models. Information for the first two distributions was obtained in Example 
12.9 and Exercise 12.21, respectively. 

13.15 Conduct the chi-square goodness-of-fit test for each of the models ob¬ 
tained in Exercise 12.96. 

.13.16 Conduct the chi-square goodness-of-fit test for each of the models ob¬ 
tained in Exercise 12.98. 

13.17 Conduct the chi-square goodness-of-fit test for each of the models ob¬ 
tained in Exercise 12.99. 

13.18 For the data in Table 13.20 determine the method of moments esti¬ 
mates of the parameters of the Poisson-Poisson distribution where the sec¬ 
ondary distribution is the ordinary (not zero-truncated) Poisson distribution. 
Perform the chi-square goodness-of-fit test using this model. 

13.19 You axe given the data in Table 13.13 which represent results from 
23,589 automobile insurance policies. The third column headed “fitted model” 
represents the expected number of losses for a fitted (by maximum likelihood) 
negative binomial distribution. 

(a) Perform the chi-squared goodness-of-fit test at a significance level 
of 5%. 


losses 
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Table 13.13 Data for Exercise 13.19 


Number of 
losses, k 

Number of 
policies, rik 

Fitted 

model 

0 

20,592 

20,596.76 

1 

2,651 

2,631.03 

2 

297 

318.37 

3 

41 

37.81 

4 

7 

4.45 

5 

0 

0.52 

6 

1 

0.06 

> 7 

0 

0.00 


(b) Determine the maximum likelihood estimates of the negative bino¬ 
mial parameters r and /?• This can be done from the given numbers 
without actually maximizing the likelihood function. 


13.5 SELECTING A MODEL 
13.5.1 Introduction 


Almost all of the tools are in place for choosing a model. Before outlining 
a recommended approach, two important concepts must be introduced. The 
first is parsimony. The principle of parsimony states that unless there is 
considerable evidence to do otherwise a simpler model is preferred. The reason 
is that a complex model may do a great job of matching the data, but that is 
no guarantee the model matches the population from which the observations 
were sampled. For example, given any set of 10 (x, y) pairs with unique x 


values, there will always be a polynomial of degree 9 or less that goes through 
all 10 points. But if these points were a random sample, it is highly unlikely 
that the population values all lie on that polynomial. However, there may 

be a straight line that comes close to the sampled points as well as the other 

points in the population. This matches the spirit of most hypothesis tests. 

That is, do not reject the null hypothesis (and thus claim a more complex 

description of the population holds) unless there is strong evidence to do so. 

The second concept does not have a name. It states that, if you try enough 
models, one will look good, even if it is not. Suppose I have 900 models at 
my disposal. For most data sets, it is likely that one of them will fit well, but 
this does not help us learn about the population. 




on the modeler’s judgment while the second set is more formal in the sense 
that most of the time all analysts will reach the same conclusions. That is 
because the decisions are made based on numerical measurements rather than 
charts or graphs. 


13.5.2 Judgment-based approaches 


Using one’s own judgment to select models involves one or more of the three 
concepts outlined below. In all cases, the analyst’s experience is critical. 
First, the decision can be based on the various graphs (or tables based on 


for much of its range of ages. In a time of limited computing power, 
distribution allowed for easier calculation of joint life values.. ’ 
of this model was reasonable, this advantage outweighed the 
but better fitting, model. Similarly, if the Pareto distribution 

Dility insurance both by the analyst’s company 
e more than the usual amount of evidence to 
ibution. 

皿 pletely determine the distribution. For exam- 
ce contract provides for at most two check-ups 
ividuais make two independent choices each year 
: a check-up. If each time the probability is q y 
i binomial with m = 2. 

that the more algorithmic approaches outlined 
l that case judgment is most definitely required, 
bhmic approach to use. 


there are other plots/tables that could be used. Other 


functions. 
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to perform the likelihood ratio test and the other is to extract a penalty for 
employing additional parameters. The likelihood ratio test is technically only 
available when one model is a special case of another (for example, Pareto vs 
generalized Pareto). The concept can be turned into an algorithm by using the 
test at a 5% significance level. Begin with the best one-parameter model (the 
one with the highest loglikelihood value). Add a second parameter only if the 
twoparameter model with the highest loglikelihood value shows an increase of 
at least 1.92 (so twice the difference exceeds the critical value of 3.84). Then 
move to three-parameter models. If the comparison is to a two-parameter 
model，a 1.92 increase is again needed. If the early comparison led to keeping 
the one-parameter model, an increase of 3.00 is needed (because the test has 
two degrees of freedom). To add three parameters requires a 3.91 increase, 


not fit well while the fit of the Poisson-inverse Gaussian is marginal at best 
(p — 2.88%). The Poisson-inverse Gaussian is a special case (r = —0.5) of 
the Poisson-ETNB. Hence, a likelihood ratio test can be formally applied to 
determine if the additional parameter r is justified. Because the loglikeli¬ 
hood increases by 5, which is more than 1.92, the three-parameter model is 
a significantly better fit. The chi-square test shows that the Poisson-ETNB 
provides an adequate fit. On the other hand, the SBC favors the Poisson- 
inverse Gaussian distribution. Given the improved fit in the tail for the three 
parameter model, it seems to be the best choice. 口 




to that paper, Carlin provides support for the 
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Table 13.14 Results for Example 13.13 


B truncated at 50 B censored at 1,000* 


Criterion 

Exponential 

Weibull 

Exponential 

WeibuH 

K-S* 

0.1340 

0.0887 

0.0991 

0.0991 

A-D* 

0.4292 

0.1631 

0.1713 

0.1712 

X 2 

1.4034 

0.3615 

0.5951 

0.5947 

p-value 

0.8436 

0.9481 

0.8976 

0.7428 

Loglikelihood 

—146.063 

一 145.683 

-113.647 

-113.647 

SBC 

-147.535 

一 148.628 

-115.145 

一 116.643 

C 

x 2 

61.913 

0.3698 



p-vahie 

10- 12 

0.9464 



Loglikelihood 

-214.924 

—202.077 



SBC 

一 217.350 

—206.929 




*K-S and A-D refer to the Kolmogorov—Smirnov and Anderson-Darling 
test statistics, respectively. 


Example 13.15 The following example is taken from Douglas [29], p. 253. 
An insurance company 3 s records for one year show the number of accidents 
per day which resulted in a claim to the insurance company for a particular 
insurance coverage. The results are in Table 13.16. Determine if a Poisson 
model is appropriate. 

A Poisson model is fitted to these data. The method of moments and the 
maximum, likelihood method both lead to the estimate of the mean, 

- 742 

A = — = 2.0329. 

The results of a chi-square goodness-of-fit test axe in Table 13.17. Any time 
such a table is made, the expected count for the last group is 

Ek+ = n^；+ = n(l — 乡 o — • •. — pk^i). 

The last three groups were combined to ensure an expected count of at 
least one for each row. The test statistic is 9.93 with six degrees of free¬ 
dom. The critical value at a 5% significance level is 12.59 and the p-value is 
0.1277. By this test the Poisson distribution is an acceptable model; however, 
it should be noted that the fit is poorest at the large values, and with the 
model understating the observed values, this may be a risky choice. □ 

Example 13.16 The data set in Table 12.13 come from Beard et al [12] 
and were previously analyzed in Example 12.58. Determine a model that ad¬ 
equately describes the data. 


Table 13.15 Results for Example 13.14 











m 


565,708.1 

565,712.4 

565,661.2 

H ；：-； 


68,570.0 

68,575.6 

68,721.2 



5,317.2 

5,295.9 

5,171.7 



334.9 

344.0 

362.9 

■ 


18.7 

20.8 

29.6 



1.0 

1.2 

3.0 

圏 

■ 

0.0 

0.1 

0.4 


■ 

f3 = 0.0350662 
r = 3.57784 

入 = 0.123304 

P = 0.0712027 

A = 0.123395 
p = 0.233862 
r = 一 0.846872 

CM square 


12.13 

7.09 

0.29 

Degrees of freedom 

2 

2 

1 

p-value 


<1% 

2.88% 

58.9% 

—Loglikelihood 

251,117 

251,114 

251,109 

SBC 


-251,130 

-251,127 

-251,129 


Table 13.16 Data for Example 13.15 


No. of claims/day 

Observed no. of days 

0 

- 47 

1 

97 

2 

109 

3 

62 

4 

25 

5 

16 

6 

4 

7 

3 

8 

2 

9+ 

0 


Parameter estimates from fitting four models axe in Table 12.13. Various 
fit measures are given in Table 13.18. Only the zero-modified geometric dis¬ 
tribution passes the goodness-of-fit test. It is also clearly superior according 
to the SBC. A likelihood ratio test against the geometric has a test statistic 
of 2(171,479 — 171,133) = 692, which with one degree of freedom is clearly 
significant. This confirms the qualitative conclusion in Example 12.58. □ 
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Table 13.17 Chi-square goodness-of-fit test for Example 13.15 
Claims/day Observed Expected Chi square 


Parameters A = 1.70805 /? = 1.15907 A = 1.10551 

r = 1.47364 /3 = 0.545039 


Chi square 

72.64 

4.06 

2.84 

Degrees of freedom 

4 

5 

5 

p-Value 

<1% 

54.05% 

72.39% 

Loglikelihood 

一 577.0 

-528.8 

—528.5 

SBC 

一 579.8 

.-534.5 

-534.2 


provides an almost perfect fit (p-value is large). Note that the Poisson-inverse 
Gaussian has two parameters, like the negative binomial. The SBC also favors 
this choice. This example shows that the Poisson-inverse Gaussian can have 
a much heavier right-hand tail than the negative binomial. □ 

Example 13.19 Comprehensive medical claims were studied by Bevan [15] in 
1963. Male (955 payments) and female (1,291 payments) claims were studied 
separately. The data appear in Table 13.21 where there was a deductible of 
25. Can a common model be used? 

When using the combined data set the lognormal distribution is the best 
two-parameter model. Its negative loglikelihood (NLL) is 4,580.20. This 
is 19.09 better than the one-parameter inverse exponential model and 0.13 
worse than the three-parameter Burr model. Because none of these models 


Table 13.19 Fit of Simon data 

_ Fitted distributions_ 

Number of Number of Negative 

claims/contract contracts Poisson binomial Polya-Aeppli 


Poisson Geometric ZM Poisson ZM geometric 


Chi square 

543.0 

643.4 

64.8 

0.58 

Degrees of freedom 

2 

4 

2 

2 

p-valne 

<1% 

<1% 

<1% 

74.9% 

Loglikelihood 

-171,373 

-171,479 

—171,160 

-171,133 

SBC 

-171,379.5 

-171,485.5 

-171,173 

-171,146 


Example 13.17 The data in Table 13.19, from Simon [122], represent the 
observed number of claims per contract for 298 contracts. Determine an ap¬ 
propriate model 

The Poisson, negative binomial, and Polya-Aeppli distributions are fitted 
to the data. The Polya-Aeppli and the negative binomial are both plausible 
distributions. The p-value of the chi-square statistic and the loglikelihood both, 
indicate that the Polya-Aeppli is slightly better than the negative binomial. 
The SBC verifies that both models are superior to the Poisson distribution. 
The ultimate choice may depend on familiarity, prior use, and computational 
convenience of the negative binomial versus the Polya-Aeppli model. □ 

Example 13.18 Consider the data in Table 13.20 on automobile liability 
policies in Switzerland taken from Buhlmann [19]. Determine an appropri- 
Qit& Tftodd. 


Three models axe considered in Table 13.20. The Poisson distribution is 
a very bad fit. Its tail is far too light compared with the actual experience. 


10669406 
00033316 
O.O.LO.2.0.0.5. 


.8.2.8.9.0.8.7 
7.7.066.4.3.4. 
4 9 9 6 3 1 


479709622516 


7626076600 
98.70.50.32.20.11.6.3.2.1. 


.9.8.4.3.8.0.4.7 
5.5.0.1.8.1.6.3. 
975311 


0 2 8 9 2 
54.92.78.44.19. 


9 5 7 5 0 0 
9 6 5 3 2 1 


4 0 3 4 0 10 


1011 
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103,704 

14,075 

1,766 

255 

45 

6 

2 

0 


a P.-i.G. stands for Poisson-inverse Gaussian. 

the lognormal is preferred. The parameters axe fi = 4.5237 and a = 1.4950. 
When separate lognormal models are fit to males {fjb = 3.9686 and a = 1.8432) 
and females (fi = 4.7713 and a = 1.2848)，the respective NLLs axe 1,977.25 
and 2,583.82 for a total of 4,561.07. This is an improvement of 19.13 over a 
common lognormal model, which is significant by both the LRT (3.00 needed) 
and SBC (7.72 needed). Sometimes it is useful to be able to use the same 
nonscale parameter in both models. When a common value of a is used, the 
NLL is 4,579.77, which is significantly worse than using separate models. □ 

Example 13.20 In 1958 Longley-Cook [86] examined employment patterns 
of casualty actuaries. One of his tables listed the number of members of the 
Casualty Actuarial Society employed by casualty companies in 1949 (55 ao 
tuaries) and 1957 (78 actuaries) • Using the data in Table 13.22 determine 
a model for the number of actuaries per company which employs at least one 
actuary and find out whether the distribution has changed over the eight-year 
period. 

Because a value of zero is impossible, only zero-trucated distributions 
should be considered. In all three cases (1949 data only, 1957 data only, 
combined data) the ZT logarithmic and ZT (extended) negative binomial dis¬ 
tributions have acceptable goodness-of-fit test values. The improvement in: 
NLL is 0.52, 0.02, and 0.94. The LRT can be applied (except that the ZT 


Parameters A = 0.155140 P = 0.150232 A = 0.144667 

r = 1.03267 0 = 0.310536 


Chi square 

1,332.3 

12.12 

0.78 

Degrees of freedom 

2 

2 

3 

Values 

<1% 

<1% 

85.5% 

Loglikelihood 

-55,108.5 

-54,615.3 

-54,609.8 

SBC 

-55,114.3 

-54,627.0 

-54,621.5 


Table 13.20 Fit of Buhlmann data 

No. of Observed _Fitted distributions_ . 

accidents frequency Poisson Negative binomial P.-i.G. a 


Table 13.21 Comprehensive medical losses for Example 13.19 


Loss 

Male 

Female 

25-50 

184 

199 

50-100 

270 

310 

100-200 

160 

262 

200-300 

88 

163 

300-400 

63 

103 

400-500 

47 

69 

500-1,000 

61 

124 

1,000-2,000 

35 

40 

2,000-3,000 

18 

12 

3,000-4,000 

13 

4 

4,000-5,000 

2 

1 

5,000-6,667 

5 

2 

6,667-7,500 

3 

1 

7,500-10,000 

6 

1 


Table 13.22 

Number of actuaries per company for Example 13.20 

Number of 

Number of 

Number of 

actuaries 

companies — 1949 

companies — 1957 

1 

17 

23 

2 

7 

7 

3-4 

3 

3 

5-9 

2 

3 

10+ 

0 

1 


logarithmic distribution is a limiting case of the ZT negative binomial distrib¬ 
ution with r —> 0), and the improvement is not significant in any of the cases. 
The same conclusions apply if the SBC is used. The parameter estimates 
(where P is the only parameter) are 2.0227, 2.8114, and 2.4479, respectively. 
The NLL for the combined data set is 74.35 while the total for the two separate 
models is 74.15. The improvement is only 0.20, which is not significant (there 
is one degree of freedom). Even though the estimated mean has increased 
from 2.0227/ln(3.0227) = 1.8286 to 2.8114/ln(3.8114) = 2.1012, there is not 
enough data to make a convincing case that the true mean has increased. □ 

13.5.4 Exercises 

13.20 (*) One thousand policies were sampled and the number of accidents 
for each recorded. The results are in Table 13.23. Without doing any for- 


o r-;.9.5.4c^ 
0.4.4.4.0.6. 
15 8 5 4 
7 0 7 2 


6 9 12 3 2 6 1 

3.9.7.15.2.4.0.o. 

2 8 5 4 3 

7V Qw- 0^ 2 

L- 

o 1 


.6.0. 1.9.5.1.0.0 
9 . 2 . 5 . 3 . 2 . o . 0.0 . 
2 2 3 6 
6 9 2 
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Table 13.23 Data for Exercise 13.20 


No. of accidents 

No. of policies 

0 

100 

1 

267 

2 

311 

3 

208 

4 

87 

5 

23 

6 

4 

Total 

1,000 


Table 13.24 Results for Exercise 13.23 


Model 

No. of parameters 

Negative loglikelihood 

Generalized Pareto 

3 

219.1 

Burr 

3 

219.2 

Pareto 

2 

221.2 

Lognormal 

2 

221.4 

Inverse exponential 

1 

224.3 


mal tests, determine which of the following five models is most appropriate: 
binomial, Poisson, negative binomial, normal, gamma. 

13.21 For Example 13.1, determine if a transformed gamma model is more 
appropriate than either the exponential model or the Weibull model for each 
of the three data sets. 

13.22 (*) From the data in Exercise 13.11 the maximum likelihood estimates 
axe A = 0.60 for the Poisson distribution and r = 2.9 and P = 0.21 for the 
negative binomial distribution. Conduct the likelihood ratio test for choosing 
between these two models. 

13.23 (*) Prom a sample of size 100, five models are fit with the results given 
in Table 13.24. Use the Schwarz Bayesian criterion to select the best model. 

13.24 This is a continuation of Exercise 12.38. Use both the likelihood ratio 
test (at a 5% significance level) and the Schwarz Bayesian criterion to decide 
if Sylvia’s claim is true. 

13.25 Using the resiilts from Exercises 12.96 and 13.15, use the chi-square 
goodness-of-fit test, the likelihood ratio test, and the Schwarz Bayesian crite¬ 
rion to determine the best model from the members of the (a, 6,0) class. 
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Table 13.25 Data for Exercise 13.28 


No. of medical claims 





















13.26 Using the results from Exercises 12.98 and 13.16, use the chi-square 
goodness-of-fit test, the likelihood ratio test, and the Schwarz Bayesian crite¬ 
rion to determine the best model from the members of the (a, 6,0) class. 

13.27 Using the results from Exercises 12.99 and 13.17, use the chi-square 
goodness-of-fit test, the likelihood ratio test, and the Schwarz Bayesian crite¬ 
rion to determine the best model from the members of the (a, 6,0) class. 

13.28 Table 13.25 gives the number of medical claims per reported automo 
bile accident. 

(a) Construct a plot similar to Figure 4.8. Does it appear that a mem¬ 
ber of the (a, 6,0) class will provide a good model? If so, which 
one? 

(b) Determine the maximum likelihood estimates of the parameters for 
each member of the (a, 6,0) class. 

(c) Based on the chi-square goodness-of-fit test, the likelihood ratio 
test, and the Schwarz Bayesian criterion, which member of the 
(a, 6,0) class provides the best fit? Is this model acceptable? 

13.29 For the four data sets introduced in Exercises 12.96, 12.98, 12.99, 
and 13.28, you have determined, the best model from among members of the 
(a, 6,0) class. For each data set determine the maximum likelihood estimates 

of the zero-modified Poisson, geometric, logarithmic, and negative binomial 

distributions. Use the chi-square goodness-of-fit test and likelihood ratio tests 

to determine the best of the eight models considered and state whether or not 

the selected model is acceptable. 

13.30 A frequency model that has not been mentioned to this point is the 
zeta distribution. It is a zerotruncated distribution withp^ = k - (P+ 1 ) /((/?+ 
1)，= 1 ， 2, ••• ， /> 〉 0. The denominator is the zeta function, which must be 
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Table 13.26 Data for Excercise 13.32(a) 


No. of claims 

No. of policies 

0 

96,978 

1 

9,240 

2 

704 

3 

43 

4 

9 

5+ 

0 


evaluated numerically as C(p + 1) = Efcli The zero-modified zeta 

distribution can be formed in the usual way. More information can be found 
in Luong and Doray [88]. 

(a) Determine the maximum likelihood estimates of the parameters of 
the zero-modified zeta distribution for the data in Example 12.58. 

(b) Is the zero-modified zeta distribution acceptable? 

13.31 In Exercise 13.29 the best model from among the members of the 
(a, 6,0) and (a, 6,1) classes was selected for the data sets in Exercises 12.96, 
12.98 ， 12.99, and 13.28. Fit the Poisson—Poisson, Polya—Aeppli, Poisson- 
inverse Gaussian, and Poisson-ETNB distributions to these data and deter- 
mine if any of these distributions should replace the one selected in Exercise 
13.29. Is the current best model acceptable? 

13.32 The five data sets presented in this problem axe all taken from Lemaire 
[82]. For each data set compute the first three moments and then use the ideas 
in Section 4.6.8 to make a guess at an appropriate model from among the 
compound Poisson collection (Poisson, geometric, negative binomial, Poisson- 
binomial (with m = 2 and m = 3), Polya-Aeppli, Neyman Type A, Poisson- 
inverse Gaussian, and Poisson-ETNB). From the selected model (if any) and 
members of the (a, 6,0) and (a, 6,1) classes, determine the best model. 

(a) The data in Table 13.26 represent counts from third-party auto¬ 
mobile liability coverage in Belgium. 

(b) The data in Table 13.27 represent the number of deaths due to 
horse kicks in the Prussian army between 1875 and 1894. The 
counts are the number of deaths in a corps (there were 10 of them) 
in a given year, and thus there are 200 observations. This data set 
is often cited as the inspiration for the Poisson distribution. For 
using any of our models, what additional assumption about the 
data must be made? 

(c) The data in Table 13.28 represent the number of major interna¬ 
tional wars per year from 1500 through 1931. 
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Table 13.27 Data for Excercise 13.32(b) 


No. of deaths No. of corps 

0 

109 

1 

65 

2 

22 

3 

3 

4 

1 

5+ 

0 

Table 13.28 

Data for Excercise 13.32(c) 

No. of wars 

No. of years 

0 

223 

1 

142 

2 

48 

3 

15 

4 

4 

5+ 

0 

Table 13.29 

Data for Excercise 13.32(d) 

No. of runs 

No. of half innings 

0 

1 } 023 

1 

222 

2 

87 

3 

32 

4 

18 

5 

11 

6 

6 

7+ 

3 


(d) The data in Table 13.29 represent the number of runs scored in 
each half-inning of World Series baseball games played from 1947 
through 1960. 

(e) The data in Table 13.30 represent the number of goals per game 
per team in the 1966-1967 season of the National Hockey League. 

13.33 Verify that the estimates presented in Example 4.64 are the maximum 
likelihood estimates. (Because only two decimals are presented, it is probably 
sufficient to observe that the likelihood function takes on smaller values at 
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Table 13.30 Data for Excercise 13.32(e) 


No. of goals 

No. of games 

0 

29 

1 

71 

2 

82 

3 

89 

4 

65 

5 

45 

6 

24 

7 

7 

8 

4 

9 

1 

10+ 

3 


each of the nearby points.) The negative binomial distribution was fit to 
these data in Example 12.56. Which of these two models is preferable? 


Five examples 


14.1 INTRODUCTION 


In this chapter we present five examples that illustrate many of the concepts 
discussed to this point. The first is a model for the time to death. The second 
model is for the time from when a medical malpractice incident occurs to when 

it is reported. The third model is for the amount of a liability payment. This 

model is also continuous but most likely has a decreasing failure rate (typical 
of payment amount variables). On the other hand, time to event variables 
tend to have an increasing failure rate. The last two examples add aggregate 
loss calculations from Chapter 6 to the mix. 


14.2 TIME TO DEATH 
14.2.1 The data 


A variety of mortality tables are available from the Society of Actuaries at 
www.soa.org. The typical mortality table provides values of the survival func¬ 
tion at each whole-number age at death. Table 14.1 represents female mor¬ 
tality in 1900, with only some of the data points presented. It is followed by 































For the second problem, let Z = 1,000(1*06— r ) be the present value ran¬ 
dom variable, where T is the time in years to death of the 20-year old. The 
calculation is . 

释 . 

When linear interpolation is used to obtain the survival function at interme¬ 
diate ages, the density function becomes the slope. That is, if x is a multiple 
of 5, then 

S(x) — S(x + 5) 

f(t) = ― 5 V —— x<t<x^5. 

Breaking the range of integration into 16 pieces gives 

E ( Z ) = 翌 ±叻：那 5 + 5 力 p LQ6 -^ 


•^[5(20+ 5j)-5(25+ 5j)]- 


•06 一如一 1.06- * * * * 5 - 5 ^ 


Consider the Makeham distribution with hazard rate functic 
Then 

S(x) = exp ^—Ax — —~—J . 

Maximum likelihood estimation cannot be used because 
given. Because it is unlikely that this model will be effective 


age 30 to 35 is 301n{[5(30) — 5(35)]/«9(20)} with the survival function using 

the Makeham distribution. The sample size comes from 1,000(0.711 - 0.681) 

with these survival function values taken from the “data， 1 The values that 

maximize this likelihood function are A = 0.006698, B = 0.00007976, and 

S = 1.09563. In Figure 14.3 the diamonds represent the “data” and the solid 
curve is the Makeham survival function (both have been conditioned on being 
alive at age 20). The fit is almost too good, suggesting that perhaps this 
mortality table was already smoothed to follow a Makeham distribution at 
adult ages. 


0.8 H 



80 100 


Age 


Fig. 14.3 Comparison of “data” and Makeham model- 


at each age. The answer is 8,405.24. For the insurance, it is difficult to do 
the integral analytically. Linear interpolation was used between integral ages 
to produce an answer of 154.90. The agreement with the answers obtained 
earlier is not surprising. 

14.2.3 Exercise 

14.1 From ages 5 through 100 the mean residual life function is essentially 
linear. Because insurances axe rarely sold under age 5, it would be reasonable 
to extend the graph linearly back to 0. Then a reasonable approximation is 
e(x) = 60 — 0.6x. FVom this, determine the density and survival function for 
the age at death and then use this function to solve the two problems. 

14.3 TIME FROM INCIDENCE TO REPORT 

Consider an insurance contract that provides payment when a certain event 

(such as death, disability, fire) occurs. There are three key dates. The first 

is when the event occurs, the second is when it is reported to the insurance 
company, and the third is when the claim is settled. The time between these 
dates is important because it affects the amount of interest that can be earned 
on the premium prior to paying the claim and because it provides a mecha¬ 
nism for estimating unreported claims. This example concerns the time from 
incidence to report. The particular example used here is based on a paper by 
Accomando and Weissner [4]. 
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It can be seen from the histogram that the underlying distribution has a 
nonzero mode. To check the tall, we can compute the empirical mean residual 
life function at a number of values. They are presented in Table 14.4. The 
function appears to be fairly constant and so an exponential model seems 
reasonable. 

14.4.2 The first model 

A two-component spliced model was selected. The empirical model is used 
through 200 (thousand) and an exponential model thereafter. There are (at 
least) two ways to choose the exponential model. One is to restrict the para¬ 
meter by forcing the distribution to place 11% (22 out of 200) of probability 
at points above 200. The other option is to estimate the exponential model 
independent of the 11% requirement and then multiply the density function 
to make the area above 200 be 0.11. The latter was selected and the re¬ 
sulting parameter estimate is 0 = 314. For values below 200， the empirical 
distribution places probability 1/200 at each observed value. The resulting 
exponential density function (for x 〉 200) is 

/(x) = 0.000662344e- x/314 . 

For a coverage that pays all losses, the kth moment is (where the 200 losses 
in the sample have been ordered from smallest to largest) 

1 178 foo 

E ^ = 200^ + L xkf{x)dx - 

Then, 


Table 14.4 Mean residual life for losses above 200 (thousand) 


Table 14.3 Losses up to 200 (thousand) 


Loss range 
(thousands) 

Number of 
losses 

Loss range 
(thousands) 

Number of 
losses 

1-5 

3 

41-50 

19 

6-10 

12 

51—75 

28 

11—15 

14 

76-100 

21 

16—20 

9 

101-125 

15 

21 - 25 

7 

126-150 

10 

26 - 30 

7 

151-200 

15 

31 - 40 

18 





Loss 


Fig. 14.5 Histogram of losses. 


14.4.1 The data 

One hundred seventy-eight losses that were 200,000 or below (all expressed in 
whole numbers of thousands of dollars) that were supplied are summarized in 
Table 14.3. In addition, there were 22 losses in excess of 200. They are listed 
below: 

206 219 230 235 241 272 283 286 312 319 385 

427 434 555 562 584 700 711 869 980 999 1506 

Finally, the 178 losses in the table sum to 11,398 and their squares sum to 
1,143,164. 

To get a feel for the data in the table, the histogram in Figure 14.5 was 
constructed. Keep in mind that the height of a histogram bar is the count 
in the cell divided by the sample size (200) and then further divided by the 
interval width. Therefore, the first bar has a height of 3/[200(5)l = 0.003. 


47701392 
3136353336312826 


200300400500600700800900 
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+ 0.000662344[314(200) + 314 2 ]e 


E(X 2 )= 


+0.000662344[314(200) 2 + 2(314) 2 (200) + 2(314) 3 ]e— 200/314 
= 45,622.93. 


The variance is 45,622.93 — 113.53 2 = 32,733.87 for a coefficient of variation 
of 1.59. However, these are for one loss only. The distribution of annual losses 
follows a compound Poisson distribution. The mean is 


E(5) = E(N)E(X) = 21(113.53) = 2,384.13 


and the variance is 


Vax(5) = E(N) Var(X) + Vax(iV)E(X) 2 

= 21(32,733.87) + 21(113.53) 2 = 958,081.53 


for a coefficient of variation of 0.41. 


expected moments. For u > 200, 


E(X Au) = 56.99 - 


^-314ie _x/314 - 314 2 e _a:/314 ) | 


= 56.99 + c (l61,396e- 200 / 314 - 314 2 e- u / 314 ) 


where c = 0.000662344 and similarly 


E[(XAu) 2 ] = 5,715.82+ c / x 2 e^ lA dx + c / u 2 e~ x ^dx 


5,715.82 + c [ 一 314a: 2 — 314 2 (2a;) — 314 3 (2)] e_ 


,715.82 + c 113,916,688e- 200/314 


-(197,192^ + 61,918,288)6 



- e coverage limitations reduce 

the variance, the risk, as measured by the coefficient of variation, has increased 
considerably. For a full year, the mean is 593.04, the variance is 321,138.09, 
and the coefficient of variation is 0.96. 

For the 2,000 excess of 500 coverage, we have, for one loss, 


113.51 - 100.24 = 13.27, 


Second moment 


Coefficient of variation 


45,494.83 — 23,993.47 - 2(500)(13.27) 
8,231.36, 

8,231.36 - 13.27 2 = 8,055.27, 

^^ = 6.76. 

13.27 


Moving further into the tail increases our risk. For one year, the three items 
are 278.67, 172,858.56, and 1.49. 


14.4.3 The second model 


AHUM 








covered employee) are as follows: 


L For each hospitalization of the employee or a member of the employee^ 
family, the employee pays the first 500 plus any losses in excess Oj 
50,500. On any one hospitalization，the insurance will pay at mos\ 
50,000. 

2. In any calendar year, the employee will pay no more than 1,000 in de¬ 
ductibles, but there is no limit on how much the employee will pay ir> 
respect of losses exceeding 50,500. 

3. Any particular hospitalization is assigned to the calendar year in which 
the individual entered the hospital. Even if hospitalization extends intc 
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Table 14.6 Hospitalizations, per family member, per year 


No. of hospitalizations 
per family member 

No. of family members 






JE9 






2,924 



No. of employees 


84 


140 


139 


131 


73 


42 


27 


33 


669 


.岑 The premium is the same，regardless of the number of family members. 

Experience studies have provided the data contained in Tables 14.6 and 
14.8. The data in Table 14.7 represent the profile of the current set of era- 
ployees. 

The first step is to fit parametric models to each of the three data sets. For 
the data in Table 14.6, 12 distributions were fitted. The best one-parameter 
distribution is the geometric with a negative loglikelihood (NLL) of 969.251 
and a chi-square goodness-of-fit p-value of 0.5325. The best two-parameter 
model is the zero-modified geometric. The NLL improves to 969.058, but by 
the likelihood ratio test, this is not sufficient to justify the second parameter. 
The best three-parameter distribution is the zero-modified negative binomial, 

which has an NLL of 969.056, again not enough to dislodge the geometric as 

our choice. For the two- and three-paxameter models there were not enough 
degrees of freedom to conduct the chi-square test. We choose the geometric 
distribution with P = 0.098495. 
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Table 14.8 Losses per hospitalization 


Loss per hospitalization 

No. of hospitalizations 

0-250 

36 

250-500 

29 

500-1,000 

43 

1,000-1,500 

35 

1,500-2,500 

39 

2,500-5,000 

47 

5,000-10,000 

33 

10,000-50,000 

24 

50,000- 

2 

Total 

288 


For the data in Table 14.7, only zerotruncated distributions should be con¬ 
sidered. The best one-parameter model is the zero-truncated Poisson with an 
NLL of 1,298.725 and a p-value near zero. The two-paxameter zerotruncated 
negative binomial has an NLL of 1,292.532, a significant improvement. The 
- P-value is 0.2571, indicating that this is an acceptable choice. The parameters 
arer = 13.207 and 0 = 0.25884. 

For the data in Table 14.8, 15 continuous distributions were fitted. The 
four best models for a given number of parameters axe listed in Table 14.9. It 
should be clear that the best choice is the Pareto distribution. The parameters 
are a = 1.6693 and 6 = 3,053.0. 

The remaining calculations were done using the recursive method, but in¬ 
version or simulation would work equally well. 

Tlie first step is to determine the distribution of payments by the employee 
per family member with regard to the deductible. The frequency distribution 
is the geometric distribution while the individual loss distribution is the Pareto 
distribution, limited to the maximum deductible of 500. That is, any losses 
in excess of 500 are assigned to the value 500. With regard to discretization 
for recursion, the span should divide evenly into 500 and then all probability 


Tab/e 14.9 Four best models for loss per hospitalization 


Name . 

No. of parameters 

NLL 

p-value 

Inverse exponential 

1 

632.632 

Near 0 

Pareto 

2 

601.642 

0.9818 

Burr 

3 

601.612 

0.9476 

Transformed beta 

4 

601.553 

0.8798 


Table 14.10 Discretized Pareto distribution with 500 lii 


Loss 

Probability 

0 

0.000273 

1 

0.000546 

2 

0.000546 

3 

0.000545 

498 

0.000365 

499 

0.000365 

500 

0.776512 


Probabilities for aggregate deductibles per family 

Loss 

Probability 

0 

0.910359 

1 

0.000045 

2 

0.000045 

3 

0.000045 

499 

0.000031 

500 

0.063386 

501 

0.000007 

999 

0.000004 

1,000 

0.004413 

1,001 

0.000001 


not accounted for by the time 500 is reached is placed there. F( 
a span of 1 was used. The first few and last few values of 
distribution appear in Table 14.10. After applying the recursh 
clear that there is non-zero probability beyond 3,000. However, 
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Table 14.12 Probabilities for aggregate deductibles per employee 


Loss 

Probability 

0 

0.725517 

1 

0.000116 

2 

0.000115 

3 

0.000115 

499 

0.000082 

500 

0.164284 

501 

0.000047 

999 

0.000031 

1,000 

0.042343 


Recursions can again be used to obtain this distribution. Because there is a 
1,000 limit on deductibles, all probability to the right of 1,000 will be placed 
at 1,000. Selected values from this aggregate distribution axe given in Table 
14.12. Note that the chance that more than 1,000 in deductibles will be paid is 
very small. The cost to the insurer of limiting the insured’s costs is also small. 
Using this discrete distribution, it is easy to obtain the mean and standard 
deviation of aggregate deductibles. They axe 150.02 and 274.42, respectively. 

We next require the expected value of aggregate costs to the insurer for 
individual losses below the upper limit of 50,000. This can be found ana¬ 
lytically. The expected payment per loss is E(X A 50,500) = 3,890.87 for 
the Pareto distribution. The expected number of losses per family member 
is the mean of the geometric distribution which is the parameter, 0.098495. 
The expected number of family members per employee comes from the zero- 
truncated negative binomial distribution and is 3.59015. This implies that 
the expected number of losses per employee is 0.098495(3.59015) = 0.353612. 
Then the expected aggregate dollars in payments up to the individual limit is 
0.353612(3,890.87) = 1,375.86. 

Then the expected cost to the insurer is the difference 1,375.86 - 150.02 = 
1,225.84. As a final note, it is not possible to use any method other than 
simulation if the goal is to obtain the probability distribution of the insurer's 
payments. This situation is similar to that of Example 17.7, where it is easy to 
get the overall distribution, as well as the distribution for the insured (in this 


if payments for 
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14.6 ANOTHER AGGREGATE LOSS EXAMPLE 


Careful modeling has revealed that individual losses have the lognormal dis¬ 
tribution with fi = 10.5430 and a = 2.31315. It has also been determined 
that the number of losses has the Poisson distribution with A = 0.0154578. 

Begin by considering excess of loss reinsurance in which the reinsurance 
pays the excess over a deductible, d, up to a maximum payment u — d ，where 
u is the limit established in the primary coverage. There axe two approaches 
available to create the distribution of reinsurer payments. The first is to work 
with the distribution of payments per payment. On this basis, the severity 
distribution is mixed, with pdf 


fyi. x ) 


■fx(x + d) 

1-F x (dy 


0 <x <u — d, 


and discrete probability 


Pi(Y = u-d) = 


1 — Fx(u) 
1-F x (d) 


This distribution would then be discretized for use with the recursive formula 
or the FFTor approximated by a histogram for use with the Heckman-Meyers 
method. Regardless, the frequency distribution must be adjusted to reflect 
the distribution of the number of payments as opposed to the number of losses. 
The new Poisson parameter will be A[1 — Fx(d)]. 


14.6.1 Distribution for a single policy 

We consider the distribution of losses for a single policy for various combina¬ 

tions of d and u. We use the Poisson parameter for the combined group and 
have employed the recursive algorithm with a discretization interval of 10,000 
and the method of rounding. In all cases the 90th and 99th percentiles axe 
zero, indicating that most of the time the excess of loss reinsurance will involve 
no payments. This is not surprising because the probability there will be no 
losses is exp(—0.0154578) = 0.985 and with the deductible this probability is 
even higher. The mean, standard deviation, and coefficient of variation for 
various combinations of d and u are given in Table 14.13. 

It is not surprising that the risk (as measured by the coefficient of variation, 
C.V.) increases when either the deductible or the limit is increased. It is also 
clear that the risk of writing one policy is extreme. 


14.6.2 One hundred policies~excess of loss 


We next consider the possibility of reinsuring 100 policies. If we assume that 
the same deductible and limit apply to all of them, the aggregate distribution 


t.hat. tV»p. frftnnfinr.v bfi nhans:ed. When 100 indeDendent Poisson 
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random variables are added, the sum has a Poisson distribution with the 
original parameter multiplied by 100. The same process was repeated with, 
the revised Poisson parameter. The results appear in Table 14.14. 

As must be the case with independent policies, the mean is 100 times the 
mean for one policy and the standard deviation is 10 times the standard 
deviation for one policy. This implies that the coefficient of variation will be 
one-tenth of its previous value. In all cases, the 99th. percentile is now above 
zero. This may make it appear that there is more risk, but in reality it just 
indicates that it is now more likely that a claim will be paid. 


14.6.3 One hundred policies— aggregate stop-loss 

We now turn to aggregate reinsurance. Assume policies have no individual 
deductible but do have a policy limit of u. There axe again 100 policies and 
this time the reinsurer pays all aggregate losses in excess of an aggregate 
deductible of a. For a given limit, the severity distribution is modified as 
before, the Poisson parameter is multiplied by 100, and then some algorithm 
is used to obtain the aggregate distribution. Let this distribution have cdf 
Fs{s) or, in the case of a discretized distribution (as will be the output from 
the recursive algorithm or the FFT), a pf /s(Si) for i = 1 ， •..,n. For 汪 
deductible of a, the corresponding functions for the reinsurance distribution 
S r axe 

⑷ = Fs(s + a), s > 0, 

/s 』 0) = A ⑷ =X) fs(si), 

Si<a 

fs r ( r i) — fs[n + a )， n = Si-a, z = 1,... ,n. 

Moments and percentiles may be determined in the usual manner. 

Using the recursive formula with an interval of 10,000, results for various 
stop-loss deductibles and individual limits are given in Table 14.15. The 
results are similar to those for the excess of loss coverage. For the most part ， 
as either the individual limit or the aggregate deductible is increased, the risk, 
as measured by the coefficient of variation, increases. The exception is when 
both the limit and the deductible are 5,000,000. This is a risky setting because 
it is the only one in which two losses are required before the reinsurance will 
take effect. 

Now suppose the 100 policies are known to have different Poisson parame¬ 
ters (but the same severity distribution). Assume 30 have A = 0.0162249 and 
so the number of claims from this subgroup is Poisson with mean 

30(0.0162249) = 0.486747. 

For the second group (50 members) the parameter is 50(0.0174087) = 0.870435 
and for the third group (20 members) it is 20(0.0096121) = 0.192242. There 
are three methods for obtaining the distribution of the sum of the three sep¬ 
arate aggregate distributions. 

1. Because the sum of independent Poisson random variables is still Pois¬ 
son, the total number of losses has the Poisson distribution with parame¬ 
ter 1.549424. The common severity distribution remains the lognormal. 
This reduces to a single compound distribution which can be evaluated 
by any method. 

2. Obtain the three aggregate distributions separately. If the recursive or 
FFT algorithms are used, the result will be three discrete distributions. 
The distribution of their sum can be obtained by using convolutions. 


Table 14.13 Excess of loss reinsurance, one policy 


Deductible (10 6 ) Limit (10 6 ) Mean 


Standard 

Deviation 


C.V. 


0.5 

0.5 

0.5 

0.5 

0.5 

1.0 

1.0 

1.0 

1.0 

5.0 

5.0 

5.0 

10.0 

10.0 


778 

2,910 

3,809 

4,825 

5,415 

2,132 

3,031 

4,046 

4,636 

899 

1,914 

2,504 

1,015 

1,605 


18,858 

94,574 

144,731 

229,284 

306,359 

80,354 

132,516 

219,475 

298,101 

62,556 

162,478 

249,752 

111,054 

205,939 


24.24 
32.50 
38.00 
47.52 

56.58 
37.69 
43.72 

54.24 
64.30 

69.58 
84.89 
99.74 

109.41 

128.71 



Table 14.14 Excess of loss reinsurance, 100 policies 


Deductible 

(10 6 ) 

Limit Mean Standard Percentilej 

(10 6 ) (10 3 ) deviation (10 3 ) C.V. 90 

3 (10 3 ) 

99 


)3)8 r 42 75764 
)0t9)7)0)9)8985 

4, 


ra8ro8l909090 ooo 
7 7 7 111 

01029 2 4891 
505672584 
,2.8.7.7.3.49.49 
3.3.4.3.4.5.6.8.0. 

673455651 
^^90 2 9221 
942831661 


112335012 
29384821304091910 

505505055 
12 12 12 2 
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storage capacity of the computer becomes an obstacle. T 
is a small-scale version of the problem and indicates a si 


being used is of length 6. Determine an approximation to the probability func¬ 
tion for the sum of the two random variables. 



X 

/i ⑷ 

协 ) 

0 

0.3 

0.4 

2 

0.2 

0.3 

4 

0.2 

0.2 

6 

0.2 

0.1 

8 

0.1 

0.0 


available, this method requires only one aggregate calculation. 

The advantage of method 2 is that there is no restriction on the frequency 
and severity components of the components. The drawback is the expansion of 
computer storage. For example, if the first distribution requires 3,000 points, 
the second one 5,000 points, and the third one 2,000 points (with the same 
discretization interval being used for the three distributions), the combined 
distribution will require 10,000 points. More will be said about this at the 
end of this section. 

The third method also has no restriction on the separate models. It has 
the same drawback as the second method, but here the expansion must be 


example, the probability of 0.16 at x = 8 is allocated to the points 5.6 and 8.4. 
Because 8 is 2.4/2.8 of the way from 5.6 to 8.4, six-sevenths of the probability 
is placed at 8.4 and the remaining one-seventh is placed at 5.6. The complete 
allocation process appears in Table 14.16. The probabilities allocated to each 
multiple of 2.8 are then combined to produce the approximation to the true 
distribution of the sum. The approximating distribution is given below. 


§■■■■■■■■■■ 
























Probability 


Table 14.17 Time to reopening of a workers compensation claim for Exercise 


0 

0.12 

0 

0.1200 



2 

0.17 

0 

0.0486 

2.8 

0.1214 

4 

0.20 

2.8 

0.1143 

5.6 

0.0857 

6 

0.21 

5.6 

0.1800 

8.4 

0.0300 

8 

0.16 

5.6 

0.0229 

8.4 

0.1371 

10 

0.09 

8.4 

0.0386 

11.2 

0.0514 

12 

0.04 

11.2 

0.0286 

14.0 

0.0114 

14 

0.01 

14.0 

0.0100 




likely to be very small. When they axe multiplied to create the convolution, 
the probabilities at the ends of the new, long vector may be so small that they 
can be ignored. Thus those cells need not be retained and do not add to the 
storage problem. 

Many more refinements are possible. In the appendix to the article by 
Bailey [9] a method which preserves the first three moments is presented. 
He also provides guidance with regard to the elimination or combination of 
- storage locations with exceptionally small probability. 

14.7 COMPREHENSIVE EXERCISES 

The exercises in this section are similar to the examples presented earlier in 
this chapter. They axe based on questions that arose in published papers. 

14.2 In New York there were special funds for some infrequent occurrences 
under workers compensation insurance. One was the event of a case being 
reopened. Hipp [57] collected data on the time from an accident to when the 
case was reopened. These covered cases reopened between April 24, 1933 and 
December 31, 1936. The data appear in Table 14.17. Determine a parametric 
model for the time from accident to reopening. By definition, at least seven 
years must elapse before a claim can qualify as a reopening, so the model 
should be conditioned on the time being at least seven years. 

14.3 In the first of two papers by Arthur Bailey [6], written in 1942 and 
1943, he observed on page 51 that a Another field where a knowledge of sam¬ 
pling distributions could be used to advantage is that of rating procedures 
for deductibles and excess coverages.” In the second paper [7], he presented 
some data (Table 14.18) on the distribution of loss ratios. In that paper he 
made the statement that the popular lognormal model provided a good fit 
and nassed the chi-sauare test. Does it? Is there a better model? 


Years 

No. reopened 

Years 

No. reopened 

7-8 

27 

15-16 

13 

8-9 

43 

16-17 

9 

9t10 

42 

17-18 

7 

10-11 

37 

18-19 

4 

11 一 12 

25 

19-20 

4 

12-13 

19 

20-21 

1 

13-14 

23 

21+ 

0 

14-15 

10 





Total 

264 


Table 14.18 Loss ratio data for Exercise 14.3 


Loss ratio 

Number 

0.0-0.2 

16 

0.2-0.4 

27 

0.4-0.6 

22 

0.6-0.8 

29 

0.8-1.0 

19 

1.0-1.5 

32 

1.5-2.0 

10 

2.0-3.0 

13 

3.0+ 

5 

Total 

173 


14.4 In 1979, Hewitt and Lefkowitz [56] looked at automobile bodily injury 
liability data (Table 14.19) and concluded that a two-point mixture of the 
gamma and loggamma distributions [If X has a gamma distribution, then 
Y = exp(X) has the loggamma distribution. Note that its support begins at 
1] was superior to the lognormal. Do you agree? Also consider the gamma 
and loggamma distributions. 

14.5 A 1980 paper by Patrik [102] contained many of the ideas recommended 
in this text. One of his examples was data supplied by the Insurance Services 
0 迅 ce on Owners, Landlords, and Tenants bodily injury liability. Policies 
at two different limits were studied. Both, were for policy year 1976 with 
losses developed to the end of 1978. The groupings in Table 14.20 have been 

condensed from those in the paper. Can the same model (with or without 

identical parameters) be used for the two limits? 










Table 14.19 Automobile bodily injury liability losses for Exercise 14.4 


Loss 

Number 


Loss 


Number 

0 - 50 


27 


750-1,000 

. 8 

50 - 100 


4 


1,000-1,500 

16 

100-150 


1 


1,500-2 ,000 

8 

150-200 


2 


2,000-2 ,500 

11 

200-250 


3 


2,500-3 ,000 

6 

250-300 


4 


3,000-4 ,000 

12 

300-400 


5 


4,000-5 ,000 

9 

400 - 500 


6 


5,000-7 ,500 

14 

500-750 


13 


7,500- 


40 





Total 


189 

Table 14.20 OLT bodily injury liability losses for Exercise 14.5 

Loss (10 3 ) 300 Limit 

500 Limit 

Loss (10 3 ) 

300 Limit 

500 Limit 

0-0.2 

.10,075 


3,977 

11-12 

56 

22 

0.2-0.5 

3,049 


1,095 

12-13 

47 

23 

0.5—1 

3,263 


1,152 

13-14 

20 

6 

1-2 

2,690 


991 

14—15 

151 

51 

2—3 

1,498 


594 

15-20 

151 

.54 

3-4 

964 


339 

20-25 

109 

44 

4-5 

794 


307 

25-50 

154 

53 

5-6 

261 


103 

50-75 

24 

14 

6-7 

191 


79 

75-100 

19 

5 

7-8 

406 


141 

100-200 

22 

6 

8—9 

114 


52 

200-300 

6 

9 

9-10 

279 


89 

300-500 

10 a 

3 

10-11 

58 


23 

500- 


0 





Totals 

24,411 

9,232 


a losses for 300+ 


14.6 The data in Table 14.21 were collected by Fisher [37] on coal mining 
disasters in the United States over 25 years ending about 1910. This particular 
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Tab/e 14.21 Mining disasters per year for Exercise 14 


No. of disasters 


No. of disasters 







>er of accidents by number of violations for Exercise 14.7 


No. of violations 

〕 1 

2 

3 

4 

5+ 

5 17,081 

6,729 

3,098 

1,548 

1,893 

( 3,131 

1,711 

963 

570 

934 

r 353 

266 

221 

138 

287 

i 41 

44 

31 

34 

66 

1 6 

6 

6 

4 

14 

)1 

1 

1 

3 

1 

Number of accidents per year 

for Exerc 

ise 14.8 


o. of stretches 

No. of accidents 

No. of stretches 

99 

6 



4 

65 

7 



0 

57 

8 



3 

35 

9 



4 

20 

10 



0 

10 

11 



1 


丄丄味 is a negative Dinomiai aistriDution appropriate? It so, are the 
same parameters appropriate? Is it reasonable to conclude that the expected 
number of accidents increases with the number of violations? 

14.8 In 1961, Simon [122] proposed using the zero-modified negative binomial 
distribution. His data set was the number of accidents in one year along 
various one-mile stretches of Oregon highway. The data appear in Table 
14.23. Simon claimed that the zero-modified negative binomial distribution 
was superior to the negative binomial. Is he correct? Ts thfirft a. mnHpl? 












Part V 


Adjusted estimates and 
simulation 


Interpolation and 
smoothing 


15.1 INTRODUCTION 

Methods of model building discussed to this point are based on ideas that came 
primarily from the fields of probability and statistics. Data axe considered to 
be observations from a sample space associated with a probability distribution. 
The quantities to be estimated are functions of that probability distribution 
for example, pdf, cdf, hazard rate (force of mortality), mean, variance. 

In contrast, the methods described in this chapter have their origins in 
the field of numerical analysis, without specific considerations of probabilistic 
statistical concepts. 

In practice, many of these numerical methods have been subsequently 
adapted to a probability and statistics framework. Although the key ideas 
of the methods are easy to understand, most of these teclmiques axe com¬ 
putationally demanding, thus requiring computer programs. The techniques 
described in this chapter axe at the lowest end of the complexity scale. 

The objective is to fit a smooth curve through a set of data according to 
some specified criteria. This has many applications in actuarial science as it 
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lei is not prespecified by a simple 
of parameters. The methods in 
ape of the resulting curve. They 
hape is complex, 
the probabilities of death within 
iion q x . These probabilities de- 
of neonatal deaths, are relatively 


is called graduation. The set of points typically represents observed rates 
of mortality (probability of death within one year) or rates of some other 
contingency such as disablement, unemployment, or accident. The methods 
described in this chapter are not restricted to these kinds of applications. 
Indeed, they can be applied to any set of successive points. 

In graduation theory, it is assumed that there is some underlying, but 
unobservable, true curve or function that is to be estimated or approximated. 



POLYNOMIAL INTERPOLATION AND SMOOTHING 


involving differences were developed so that an actuary could develop smo( 


With the advent of computers in the 1950s and 1960s, many computerized 
mathematical procedures were developed. Among them was the theory of 
splines, this time not mechanical in nature. As with graduation, the objective 
of splines is to find an appropriate balance between fit and smoothness. The 
solutions that were developed were in terms of linear systems of equations 
that could be easily solved on a computer. The modern theory of splines 
dates back to Schoenberg [117]. 

In this chapter, we focus only on the modern techniques of spline interpo 
lation and smoothing. These techniques axe so powerful and flexible that they 
have largely superseded the older methods. 


15.2 POLYNOMIAL INTERPOLATION AND SMOOTHING 

Consider n + 1 distinct points labeled (xq, yo), … ， (x ni y n ) with xo < 

xi < a ；2 < • • * < x n . A unique polynomial of degree n can be passed through 


f(x )= 


j = 0,1,..., 71. (15.2) 

Equations (15.2) form a system of n +1 equations in n +1 unknowns {aj ； j = 


system of equations may be difficult. 

Fortunately, the solution can be explicitly written without solving the sys¬ 
tem of equations. The solution is known as Lagrange’s formula: 

" 、- (x - X!)(x - x 2 ). .•(x - x n ) 

/ W - 2/0 ( XQ _ rci)(x 0 - x 2 )... (x 0 - x n ) 

{x- Xp)(x - X 2 ) … (X — X n ) 

Vl (xi - X 0 )(a；i - X 2 ) . .. (xi - X n ) 

+ … 

(x- Jo) (a: -Xi)...(x- x n -i) 

Vn {X n - X 0 )(X„ - Xi) ... (X n - X n -i) 

— y' Qr _ rco) …( 尤 — Xj-i){x - x j+ i) ...(x~x n ) ( 15 3 ) 

— i. X 3 - x o) ■■- {Xj - - X j+ i) . . . (Xj ~X n )' 
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Table 15.1 Mortality rates for Example 15.1 


Estimated 


3 

Ages 

Exposed 
to Risk 

Actual 

Deaths 

Mortality Rate 
Per 1,000 

0 

25-29 

35,700 

139 

3.89 

1 

30-34 

244,066 

599 

2.45 

2 

35-39 

741,041 

1,842 

2.49 

3 

40-44 

1,250,601 

4,771 

3.81 

4 

45-49 

1,746,393 

11,073 

6.34 

5 

50-54 

2,067,008 

21,693 

10.49 

6 

55-59 

1,983,710 

31,612 

15.94 

7 

60-64 

1,484,347 

39,948 

26.91 

8 

65-69 

988,980 

40,295 

40.74 

9 

70-74 

559,049 

33,292 

59.55 

10 

75-79 

241,497 

20,773 

86.02 

11 

80-84 

78,229 

11,376 

145.42 

12 

85-89 

15,411 

2,653 

172.15 

13 

90-94 

2,552 

589 

230.80 

14 

95 - 

162 

44 

271.60 


Total 

11,438,746 

220,699 



To verify that (15.3) is the collocation polynomial，note that each term is a 
polynomial of degree n and that when x = Xj the right-hand side of (15.3) 
takes on value yj for each of j = 0,1,2,..., n. 

The n-degree polynomial f(x) provides interpolation between (Xo ， yo) and 
(x n ,y n ) and passes through all interior points {(xj,yj);j = — 1}. 

However, for large n, the function f(x) can exhibit excessive oscillation; that 
is to say, it can be very ‘Viggly •” This is particularly problematic when there 
is some “noise” in the original series {(Xj,yj)\j = Such noise can 

be caused by measurement error or random fluctuation. 

Example 15.1 The data in Table 15.1 are from Miller [94], p. 62. They 
are observed mortality rates in five-year age groups. The estimated mortality 
rates are obtained as the ratio of the dollars of death claims paid to the total 
dollars exposed to death. 1 The rates are plotted in Figure 15.1. 

The estimates of mortality rates at each age axe the maximum likelihood 
estimates of the true rates assuming mutually independent binomial models 

1 Deaths and exposures are in units of Sl,000. It is common in mortality studies to count 
dollars rather than lives in order to give more weight to the larger policies. The mortality 



To avoid the excessive oscillatory behavior or wiggliness, lower order poly- 
inials could be used for interpolation. For example, successive values could 
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be joined by straight lines. However, the successive interpolating lines form a 
jagged series because of the “kinks” at the points of juncture. 

Another method is to piece together a sequence of low-degree polynomials. 
For example, a quadratic function can be collocated with successive points 
at (xo, ( 卬2 , 尤 3, 冚4 ) .... However, there will not be smoothness at the 

points of juncture X 2 , ... in the sense that the interpolating function will 

have kinks at these points with slopes and curvature not matching. One way 
to get rid of the kinks is to force some left-hand and right-hand derivatives 
to be equal at these points. This creates apparent smoothness at the points 
of juncture of the successive polynomials. This is the key idea behind splines. 
.Interpolating splines are piecewise polynomial functions that pass through the 
given data points but that have the added feature that they are smooth at 

the points of juncture of the successive pieces. The order of the polynomial is 

kept low to minimize “wiggly” behavior. Interpolation using cubic splines is 
introduced in Section 15.3. 

An alternative to interpolation is smoothing, or, more precisely, fitting a 
smooth, function to the observed data but not requiring that the function pass 
through each data point. Polynomials allow for great flexibility of shapes. 
However, this flexibility of shape also makes polynomials quite risky to use 
for extrapolation, especially for polynomials of high degree. This was the 
case in Figure i5.2, where the extrapolated values, even for one year, were 
completely unreliable. As with the fitting of other models earlier in this book, 
a fitting criterion needs to be selected in order to fit a model. We will illustrate 
the use of polynomial smoothing by using a least squares criterion. Figures 
15.3—15.6 show the fits of polynomials of degree 2, 3, 4, and 5 to the data of 
Example 15.1. It should be noted that the fit improves with each increase 
in degree because there is one additional degree of freedom in carrying out 
the fit. However, it can be seen that as each degree is added the behavior 
of the extrapolated values for only a few years below age 27 and above age 
97 changes quite significantly. Smoothing splines provide one solution to this 
dilemma. Smoothing splines are just like interpolating splines except that the 
spline is not required to pass through the data points but, rather, should be 
close to the data points. Cubic splines limit the degree of the polynomial to 



Fig. 15.3 Second-degree polynomial fit. 
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0.55 



20 30 40 50 60 70 80 90 100 

Age 


Fig. 15.4 Third-degree polynomial fit. 


15.2.1 Exercises 

15.1 Determine the equation of the polynomial that interpolates the points 
(2,50), (4,25), and (5,20). 

15.2 Determine the equation of the straight line that best fits the data of 


15.3 CUBIC SPLINE INTERPOLATION 

Cubic splines are piecewise cubic functions that have the property that the 
first and second derivatives can be forced to be continuous, unlike the approach 
of successive polynomials with jagged points of juncture. 

Cubic splines axe used extensively in computer-aided design and manufac¬ 
turing in creating surfaces that are smooth to the touch and to the eye. The 
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re pj and qj axe undetermined constants 
differentiate (15.6) twice.] 
ubstituting Xj and Xj + ± into (15.6) yields 

Vi = + Pjl 



because = yj and fj(x j+1 ) = y j+1 . 

We now obtain the constants and On froi 



















be added using this procedure. 

Other endpoint constraints may be a bit more complicated and may require 
modification of the first and last of the system of equations (15.14), which will 
result in changes in the matrix H and the vector v in (15.17). 
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derivatives being identical at both ends of the first and last intervals; that is, 
mo = Trii and m n = m n _i. As a result, the first and last equations of (15.14) 
are replaced by 

(3/io + 2h\)mi + him 2 = 從1， 

/l n _2771 n _2 (2/ln - 2 + 3ft n —i)772 n —i = ^n—1* (15.19) 

Case 4: Cubic Runout Spline 

This method requires the cubic over [xo,a ； x] to be an extension of that 
over [xi ， X 2 】，thus imposing the same cubic function over the entire interval 
[a ； o, 尤 2 ]. This is also known as the not-a-knot condition. A similar condition 
is imposed at the other end. 

This can be achieved by requiring that the third derivatives at the endpoints 
also agree at x± and x n -i\ that is, 

JT ㈤ = fi " ⑹ 

and 

fn-2( x n-l) = fn~l( x n~l)- 

Because the third derivative is then constant throughout [a ： o ， a ；2 ] and also 
throughout [x n _ 2 ,x n ], the second derivative will be a linear function through** 
out the same two intervals. Hence, the slope of the second derivative will be 
the same in any subintervals within [a; 0} ^ 2 ] and within [x n - 2 ,x n ]. Thus, we 
can write 


or equivalently 


mi — mo _ m2 — mi 
ho h± 

?7l n — 77l n —i — TJl n —i — 77l n —2 

hji—1 h n —2 


mo = 
m n = 


ho(m2 — mi) 
hi ， 

, 九 n-1 ■( 饥 n-1 — 饥 n-2) 
十 kI 2 . 


(15.20) 


Then the first and last equations of (15.14) are replaced by 


3fto + + f) ttii 4* (-f) 7712 = Wl, 


(—fe) 


2/i n —2 + 3h n 


- 1 + fe)' 


(15.21) 
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Case 5: Clamped Cubic Spline 

This procedure fixes the slope /^(x 0 ) and /4^i(^n) of the spline at each 
endpoint. In this case, from (15.11) and (15.12), the second derivatives are 

m 。= 

， = (15 . 22) 
As a result the first and last equations of (15.14) axe replaced by 
(f/io + 2/ii) mi -f h\rri 2 = — 3 ^ 沒 1 知外 — /o(®o)^ j 

and 

/in 一 2 饥 n 一 2 + (2/l n _2 + f ftn-l) T^n-X — u n—l ~~ 3 ^fn-l( x n ) — 鉍 C ^ 

respectively. 

Example 15.3 From first principles，using conditions I-V } obtain the cubic 
spline through the points (2,50), (4,25), and (5,20) with the clamped boundary 
conditions /’(2) = —25 and / ^5) = —4. 

Let the cubic spline in the interval from rco = 2 to rci = 4 be the polynomial 

fo(x) = 50 + bo(x 一 2) + cq(x — 2) 2 + do(x — 2) 3 

and the spline in the interval from a；i = 4 to X 2 = 5 be the polynomial 

f ± {x) = 25 + biix — 4) + ci(x — 4) 2 + dx(x-4) 3 . 

The six coefficients b 0 , c 0 ,do, h, c i? di are the unknowns that we need to de¬ 
termine. From the interpolation conditions 

/o ⑷ = 50 + 26o + 4co + 8do = 25 ， 

/i(5) = 25 + 6i + Ci + di = 20. 

From the smoothness conditions at a; = 4 

苽 ⑷ = 2c 0 + 6d 0 (4-2) = /R4) = 2 Cl . 

Finally, from the boundary conditions, we get 


/o(2) = b 0 = 一 25 ， 

/{(5) = 6i + 2ci + 3d x = -4. 


LINE INTERPOLATION 497 






498 INTERPOLATION AND SMOOTHING 



Fig. 15.7 Clamped and natural splines for Example 15.3. 


Figure 15.7 shows tlie interpolating cubic spline and the corresponding 
natural cubic spline that is the solution of Exercise 15.3. It also shows the 
function h(x) = 100/rc, which also passes through the same three knots. The 
slope of the clamped spline at the endpoints is the same as the slope of the 
function h(x). These endpoint conditions force the clamped spline to be much 
closer to the function h(x) than the natural spline. The natural spline has 
endpoint conditions that force the spline to look more like a straight line near 
the ends due to requiring the second derivative to be zero at the endpoints. □ 

The cubic splines in this section all pass through the knots. If smoothing 
is desired, that restriction may be lifted. Smoothing splines are introduced in 
Section 15.6. 

Example 15.4 The data in the last column of Table 15.1 are one-year mor¬ 
tality rates for the 15 five-year age intervals shown in the first column. The 
last interval is treated as 95-99. We have used a natural cubic spline to in¬ 
terpolate between these values as follows. The listed mortality rate is treated 
as the one-year mortality rate for the middle age within the five-year interval. 
The resulting values are treated as knots for a natural cubic spline. The fit¬ 
ted interpolating cubic spline is shown in Figure 15.8 on a logarithmic scale. 
The formula for the spline is given in Property I of Definition 15.2. The 


Table 



Fig. 15.8 Cubic spline fit to mortality data for Example 15.4. 


Table 15.2 Spline coefficients for Example 15.4 


3 

X 3 

a 3 

bj 

C j 

d 3 

0 

27 

3.8936 xl0~ 3 

-3.5093x10— 4 

0 

2.5230xl0— 6 

1 

32 

2.4543xl0— 3 

-1.6171x10— 4 

3.7844x10— 5 

-8.4886xl0— 7 

2 

37 

2.4857xl0- 3 

1.5307xl0— 4 

2.5112x10 一 5 

-5.1079x10— 7 

3 

42 

3.8150x10— 3 

3.6587xl0— 4 

1.7450xl0- 5 

2.0794xl0— 6 

4 

47 

6.3405x10- 3 

6.9632x10— 4 

4.8640x10- 5 

-4.3460xl0_ 6 

5 

52 

1.0495xl0- 2 

8.5678xl0- 4 

-1.6550 xl0~ 5 

1.2566x10— 5 

6 

57 

1.5936xl0 -2 

1.6337xl0- 3 

1.7194xl0— 4 

-1.1922X1Q- 5 

7 

62 

2.6913x10- 2 

2.4590 xlO -3 

-6.8828xl0— 6 

1.3664x10— 5 

8 

67 

4.0744x10— 2 

3.4150x10— 3 

1.9808xl0— 4 

-2.5761x10-® 

9 

72 

5.9551 xlO -2 

3.4638xl0- 3 

-1.8833xl0- 4 

1.1085xl0- 4 

10 

77 

8.6018 xl0~ 2 

9.8939 xlO- 3 

1.4744xl0~ 3 

-2.1542xl0- 4 

11 

82 

1.4542xl0 _1 

8.4813x10— 3 

-1.7569x10- 3 

2.2597xl0~ 4 

12 

87 

1.7215xl0 _1 

7.8602 xlO -3 

1.6327xl0_ 3 

-1.7174x10 - 4 

13 

92 

2.3080X10 -1 

1.1306x10— 2 

-9.4349xl(T 4 

6.2899x10-® 


15.3.2 Exercises 

15.3 Repeat Example 15.3 for the natural cubic spline by removing the 
clamped spline boundary conditions. 

15.4 Construct a natural cubic spline through the points ( 一 2,0) ，（一 1,1 )， 
(0,0), (1,1), and (2,0) by setting up the system of equations (15.16). 

IKK 'noi-orminp if tlip fnllnwirio- nan he cubic snlines: 















卜， -4 < a; < 0, 

f{x) = I x 3 + x, 0 < x < 1, 

{ 3x 2 -2x + 1 , 1<x<9. 


z 3 ’ 0 < a; < 1, 

3x 2 - 3a; + 1, 1 < a: < 2, 

a： 3 - Ax 2 + 13a; - 11, 2 < re < 4. 


-1 < x < 0, 

0 < a: < 1, 

1 < x < 3. 


15.6 Determine the coefficients a, b, and c so 


, + b(x-l) + c(x - If + 4(rc - l) 3 , 1 < a ； < 3. 


15.8 Consider the function 


28 + 25a; + 9a; 2 + x 3 , 
26 + 19a; + 3a; 2 —a; 3 , 
26 + 19a: + 3a: 2 -2a: 3 , 


-3<x< -1, 
-1 < x < 0, 

0 < a; < 3, 


{ -163 + 208x-60x 2 + 5x 3 , 3 < a; < 4. 

⑷ Prove that f(x) can be a cubic spline. 

(b) Determine which of the five endpoint conditions could have been 
used in developing this spline. 


15.4 APPROXIMATING FUNCTIONS WITH SPLINES 


The natural and clamped cubic splines have a particularly desirable property 
1 1 in approximation to some other continuous 

5 function 
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consider the interpolating cubic spline to be an approximation to the fane 





II 

i] 


most popular of such measures is the squared norm 


representing the total squared second derivative. 

Now consider any continuous function h(x) that also has continuous first 
and second derivatives over some interval [xo,x n ]. Suppose that we select 
n — 1 interior knots {rcj, with xo < x± < X 2 < •- < x n . 

Let f(x) be a cubic spline that collocates with these knots and has endpoint 
conditions either 


f(x 0 ) = h\xo) and f{x n ) = hi {xn) (clamped spline) 


f"(x 0 ) = o and f"(x n ) 


(natural spline). 


The natural or clamped cubic spline f(x) has less total curvature than any 
other function h(x) passing through the n+1 knots, as shown in the following 
theorem. 


the n + 1 given knots. Let h(x) be any function with continuous first and sec¬ 
ond derivatives that passes through the same knots. Also, for the clamped 
cubic spline assume h’(x。）= f(xo) and h!= f{x n ). Then 


[f\x)) 2 dx< / [h N {x)] 2 dx. 


Proof: Let ns form the difference D(x) = h(x) — f(x). Then, D n {x )= 
h n {x) — f"(x) and therefore 


内 〃⑷] 2 = [/ 〃 ( 工 )] 2 + P 〃⑷] 2 + 2f'{x)D"{x). 


Integrating both sides produces 


[h ,f (x)] 2 dx= / [f\x)fdx^ / [D N {x)fdx + 2 / f {x)D ,f {x)dx. 


The result will be proven if we can show that 


f /, (x)D // (x)dx = 0 
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because the total curvature of the function h(x), 

厂 V ⑷] 2 咖， 

will be equal to the total curvature of the spline, 


plus a nonnegative quantity 


[f\x)Ydx 


[D"(x)] 2 dx. 


Applying integration by parts，we get 

f 11 f"{x)D"{x)dx = nx)D\x)\ 


f"\x)D'{x)dx. 


For tlie clamped cubic spline, the first term is zero because the clamped 
boundary conditions imply that 

D'{x 0 ) 4# h'{x Q ) - f(x 0 ) = 0, 

D'(x n ) = h'{x n ) - f(x n ) = 0. 

For the natural cubic spline f'(x 0 ) = f"{x n ) = 0, which also makPR the first 
term zero. 

The integral in the second term can be divided into subintervals as follows: 


广 n •: : r^j+1 

/ r(x)D f (x)dx=j2 / 

Jx ° i=0 Jx 3 

Integration by parts in each subinterval yields 
r +1 f"\x)D'{x)dx = f'"(x)D(x)\ x J +1 


f"\x)D'(x)dx. 


/ ⑷ (x)D(x)dar. 


first term is zero because of the interpolation condition 

D ( x j) = h (^j) - f(xj) = 0, j = 0, 

% we are only considering functions h(x ) ，… •’ 

2 second term is zero because the spline 
polynomial and has zero fourth derivati 


natural cubic spline, 




f\x)D , \x)dx = 0, 
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which proves the result. □ 


Thus the clamped cubic spline has great appeal if we want to produce a 



clamped cubic splines can be used. Including a clamping condition controls 
the slope at the endpoints. 


Example 15.6 For the clamped cubic spline obtained in Example 15.3 calcu¬ 
late the value of the squared norm measure of curvature. Calculate the same 
quantity for the function h(x) — 100/x which also passes through the given 
knots. 


The spline function is 


50 — 25(x 一 2) + 9.125(x - 2) 2 - 1.4375(a; - 2) 3 , 2 < a; < 4, 
25 — 5.75(® 一 4) + 0.5(x - 4) 2 + 0.25(x -4) 3 , 4 < x < 5, 


and the second derivative is 


n( . _ f 18.25 - 8.625(0 ； -2) = 35.5 - 8.625a;, 2 < a; < 4, 
/ t 1 + i.5( x 一 4) = 1.5x — 5 4 < a: < 5. 


The total curvature of the spline is 


LT ⑷] 2 心 


(35.5 — 8.625x) 2 dx+ / (1.5x - bfdx 


: dy+ y 2 T^ d v 


For h(x), the second derivative is h f/ (x) = 200rc— 3 and the curvature is 


(200x~ z ) 2 dx 
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Notice how close the total curvature of the function h(x) and the clamped 
spline are. Now look at Figure 15.7, which plots both functions. They axe very 
similar in shape. Hence we would expect them to have similar curvature. Of 
course, as a result of Theorem 15.5, the curvature of the spline should be less, 
though in this case it is only slightly less. In Exercise 15.9 you axe asked to 
calculate the total curvature of the corresponding natural spline (which also 
appears in Figure 15.7). Because it is much “straighter，” you would expect its 
total curvature to be significantly less, which is confirmed in Exercise 15.9. □ 

15.4.1 Exercise 

15.9 For the natural cubic spline obtained in Exercise 15.3 calculate the value 
of the squared norm measure of curvature. 


15.5 EXTRAPOLATING WITH SPLINES 

In many applications we may want to produce a model that can be faithful to a 
set of historical data but that can also be used to forecast into the future. For 
example, in determining liabilities of an insurer when future claim payments 
are subject to inflationary growth, the actuary may need to project the rate 
of future claims inflation for some 5 to 10 years into the future. One way to 
do this is by fitting a function, in this case a cubic spline, to historic claims 
inflation data. 

Simply projecting the cubic in the last interval beyond x n may result in 
excessive oscillatory behavior in the region beyond x n . This could result in 
projected values that axe wildly unreasonable. It makes much more sense 


■Lue natural cudic spirne nas enapoint conditions that require the second 
derivatives to be zero at the endpoints. The natural extrapolation is linear 
the slope coming from the endpoints. Of course, the linear extrapola- 
tion function can be done for any spline using the first derivative at the end 
points. However, unless the second derivative is zero, as with the natural 
spline, the second derivative condition will be violated at the endpoints. The 
extrapolated values at each end are then 

f(^) = X 〉 

/ ⑷ =/( 工 o) — f(xo)(xo - x), x< x 0 . 

Example 15.7 Obtain formulas for the extrapolated values for the clamped 
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In the first interval, f(x) = 50 — 25(x 一 2) + 9.125(x - 2) 2 - 1.4375(2 ； - 2) 3 
and so /(2) = 50 and / ; (2) = —25. Then, for a; < 2, the extrapolation 
is f(x) = 50 — (—25)(2 — x) = 100 — 25a;. In the final interval, f(x) = 
25 — 5.75(x — 4) + 0.5(x — 4) 2 + 0.25(re — 4) 3 and so /(5) = 20 and f (S) = -4. 
Then, for re > 5, the extrapolation is f(x) = 20 —4(rr — 5) = 40 — 4x. At re = 0 
the extrapolated value is 100 — 25(0) = 100 and at a; = 7 it is 40—4(7) = 12. □ 

15.5.1 Exercise 

15.10 Obtain formulas for the extrapolated values for the natural spline in 
Exercise 15.3 and determine the extrapolated values at rr = 0 and x = 7. 

15.6 SMOOTHING SPLINES 

In many actuarial applications, it maybe desirable to do more than interpolate 
between observed data, li data include a random (or “noise” ） element, it is 
often best to allow the cubic spline or other smooth function to lie near the 
data points, rather than requiring the function to pass through each data 
point. 

In the terminology of graduation theory as developed by actuaries in the 
early 1900s, this is called modified osciolatory interpolation. The term 
modified is added to recognize that the points of intersection (or knots in the 
language of splines) are modified from the original data points. 

The technical development of smoothing cubic splines is identical to inter¬ 
polating cubic splines except that the original knots at each data point {x^yi) 
are replaced by knots at (Xj ， aj) where the ordinate aj is the constant term 
in the smoothing cubic spline 

fj(x) = aj + bj(x — Xj) + Cj(x — Xj ) 2 + dj(x - Xj) 3 . (15.25) 

We first imagine that the ordinates of original data points axe the outcomes 
of the model 

Vj = 

where 6j, j = 0,1,... ,71, axe independently distributed random variables with, 
mean 0 and variance cr| and where g(x) is a well-behaved function. 3 

Example 15.8 Mortality rates qj at each age j are estimated by the ratio of 
observed deaths to the number of life-years of exposure Dj/ujj where Dj is 
a binomial (n,-,g 7 ) random variable. The estimator qj = dj/rij, where dj is 
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the observed number of deaths，has variance = qj(l — Qj)/^jy which can.be 
estimated by qj(l — 4j)/ n j- 

We attempt to find a smooth function /(x), in this case a cubic spline, 
that will serve as an approximation to the “true” function g(x). Because g(x) 
is assumed to be well behaved, we will require the smoothing cubic spline 
f(x) itself to be as smooth as possible. On the other hand, we want it to be 
faithful to the given data as much as possible. These are conflicting objectives. 
Therefore, a compromise will be necessary between fit and smoothness. 

The degree of fit can be measured using the chi-square criterion 

- ( 腿 ) 

i=o \ 3 y 

This is a standard statistical criterion for measuring the degree of fit and was 
discussed in that context in Section 13.4.3. It has a chi-square distribution 
with n + 1 degrees of freedom. 4 

The degree of smoothness can be measured by the overall smoothness of 
the cubic spline. The smoothness, or equivalently the total curvature, can be 
measured by the squared norm smoothness criterion 

S= r n [f{x)] 2 dx. 

Jxq 


It was shown in Theorem 15.5 that within the broad class of functions with 
continuous first and second derivatives, the natural or clamped cubic spline 
minimizes the squared norm. This supports the choice of the cubic spline as 
the smoothing function. 

In order to recognize the conflicting objectives of fit and smoothness, we 
construct a criterion which is a weighted average of the measures of fit and 
smoothness. Let 


L 


pF (1 - p)S 



+(i-p) 



[f"{x)fdx. 


The parameter p reflects the relative importance which we give to the 
conflicting objectives of remaining close to the data, on the one hand, and of 
obtaining a smooth curve, on the other hand. Notice that a linear function 
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which suggests that, in the limiting case, where p = 0 and thus smoothness is 
all that matters, the spline function f(x) will become a straight line. At the 
other extreme, where p = l and thus the closeness of the spline to the data is 
all that matters, we will obtain an interpolating spline which passes exactly 
through the data points. 

The spline is piecewise cubic and thus the smoothness criterion can be 
written 

S= / [/" ⑷ ] 2 血 = 乙 / [f^x)] 2 dx. 

Prom (15.5), 

= 驚 (x j+1 -x) + 

and then 

£ +1 [/ ； (x)]^x = £ +1 + 

=[[rrijil -y) + m j+ 1 y } 2 hj dy 
Jo 

hj j。[nij + (m j+1 - rrij)y ] 2 dy 

=h [m.j + (m j+ i - mj)yf | 1 
— 3 3(m i+ i - rrij) | 0 

where the substitution y = {x—Xj)/hj is used in the second line. The criterion 
fiinction then becomes 

n / \ 2 n—1 i 

L = p J2 +( 1_ P)S -f^ + + m j+i)- 

We need to minimize this function with respect to the 2n + 2 unknown 
quantities = 0,.• • ， n}. Note that when we have solved for these 

variables we will have four pieces of information {a^, aj + i , mj } } for each 

interval [xj, a^+i]. This allows us to fully specify the interpolating cubic spline 
in each interval. We now address the issue of solving for these quantities. 

We now consider the natural smoothing spline. The equations developed 
for interpolating splines apply to smoothing cubic splines except that the 
y?s are replaced by a^s to recognize that the abscissas {%*; j = 0,..., n} of 
the smoothing splines do not pass through the abscissas of the data points 
{yfj = 0,From (15.16), we can write 


寸 (m! + mj-mj+i -f* m| +1 ), 


Hm = u, 




















L = p(y - a) T S _1 (y — a)+ |(1— p)m T Hm 

where S = diag{a§ ， erf ， • • • ， cr^}. Because m = H—iRa，we can rewrite the 
criterion as 

L = P{y - a) T S _1 (y — a) + |(1 — p)a T R T H _1 Ra. 

We can differentiate the criterion L with respect to each of ao,ai,... ,a n 
successively to obtain the optimal values of the ordinates. In matrix notation, 
the result is (after dividing the derivative by 2) 

-p{y- a) T S- x + i(l -p)a r R T H- 也 = 0 ， 

where 0 is the (n + 1) x 1 vector of zeros, (() ， ••. ， 0) T . This yields, after 
transposition, 

epS-^y 一 a) = (1 -p)R T H- 1 Ra 
or 

6pS _1 (y - a) = (1 - p)R r m. (15.28) 

We now premultiply by RS, yielding 

6pRSS _1 (y — a) = (1 -p)RSR T m 
or 

6p(Ry - Ra) = (1 - p)RSR r m. (15.29) 

Because Hm = Ra, this reduces to 

pKy- pHxn = |(1 — p)RSR T m 


or 
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(pH + i(l-p)RSR T ) m = pRy. (15.30) 

This is a system of n—1 equations inn—1 unknowns. The system of equations 
can be solved for mi ， m 2 , … ， m n _i. Using matrix methods, the solution can 
be obtained from (15.30) as 


m = (H + i^RSR T j Ry. (15.31) 

Now, the values of G 0 , 別， … ， a n can be obtained by rewriting (15.28) as 



SR r m. 


Finally, substitution of (15.31) into (15.32) results in 

a = y —(H+i^RSR r ) ^y. 


(15.32) 


(15.33) 


Thus we have obtained the values of the intercepts of the n cubic spline 
segments of the smoothing spline. The values of the other coefficients of the 
spline segments can now be calculated in the same way as for the natural 
interpolating spline, as discussed in Section 15.3 using the knots {(xj^aj), 
j = 0, • • • ， n} and setting mo = m n = 0. It should be noted that the only 
additional calculation for the natural smoothing spline as compared with the 
natural interpolation spline is given by (15.33). 

The magnitude of the values of the criteria for fit F and smoothness S 

may be very different. Therefore one should not place any significance on. the 

specific choice of the value of p (unless it is 0 or 1). Smaller values of p result 
in more smoothing; larger values result in less. In some applications it may 
be necessary to make the value of p very small, for example, 0.001, to begin 
to get visual images with any significant amount of smoothing. This is, in 
part, due to the role of the variances which appear in the denominator of the 
fit criterion. Small variances can result in the fit term being much larger than 
the smoothness term. Therefore, it may be necessary to have a very small 
value for p to get any visible smoothing. 
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Fig. 15.13 Smoothing spline for mortality data with p = 0.1. 

Ixample 15.10 demonstrated the smoothing capability of splines. However, 
still needs to choose a value of p. In practice, this is done using professional 

















Credibility 


16.1 INTRODUCTION 

Credibility theory is a set of quantitative tools which allows an insurer to 
perform prospective experience rating (adjust future premiums based on past 
experience) on a risk or group of risks. If the experience of a policyholder is 
consistently better than that assumed in the underlying manual rate (some¬ 
times called the pure premium), then the policyholder may demand a rate 
reduction. 

.The policyholder^ argument is as follows: The manual rate is designed 
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more credible the policyholder’s own ( 
In the same vein, in group insurance tl 
more credible than that of smaller grot 


discussed in Section 16.4.4. Practical improvements were made by Bfih lman n 
and Straub in 1970 [20]. Their model is discussed in Section 16.4.5. The 
concept of exact credibility is presented in Section 16.4.6. 

Practical use of the theory requires that unknown model parameters be esti¬ 
mated from data. Nonparametric estimation (where the problem is somewhat 
model free and the parameters axe generic, such as the mean and variaiice) 
considered' * 


2. Competitive considerations may force 


nearly full credibility to a given policyholder in order to retain the busi- 


Another use for credibility is in the setting of rates for classification systems. 




to accurately estimate the expected cost for insuring these classes, it may 
be appropriate to combine the limited actual experience with some other 
information, such as past rates, or the experience of occupations that are 
closely related. 

From a statistical perspective, credibility theory leads to a result that would 
anneax to be counterintuitive. If experience from an insured or group of 
itistical training may convince us to use the sample 
3d estimator. But credibility theory tells us that it 
al weight to this experience and give the remaining 
uced from other information. We will discover that 
of bias we gain in terms of reducing the average 


It is at this point in the discussion that the ordinary individual has to 
admit that, while there seems to be some hazy logic behind the actu- 
aries ， contentions, it is too obscure for him to understand. The trained 
statistician cries M Absurd! Directly contrary to any of the accepted the¬ 
ories of statistical estimation.” The actuaries themselves have to admit 
that they have gone beyond anything that has been proven mathemat¬ 
ically, that all of the values involved are still selected on the basis of 
judgment, and that the only demonstration they can make is that, in 
actual practice, it works. Let us not forget, however, that they have 
made this demonstration many times. It does work! 


3 an insurer to quantitatively formulate the above 
provides an introduction to this theory. A few 
;s are reviewed in the next section. Some topics 
.2 and 12.4 are repeated and there axe some new 
， ell. 

limited fluctuation credibility theory, a sub- 
f part of the twentieth century. This provides a 
II (Section 16.3.1) or partial (Section 16.3.2) cred- 
xperience. The difficulty with this approach is the 
5 mathematical theory justifying the use of these 
s approach provided the original treatment of the 
oday. 

nann in 1967 [18] provided a statistical framework 
eorv has developed and flourished. While this ap- 


16.2 STATISTICAL CONCEPTS 


In this section various statistical concepts relevant to credibility theory are 
presented. Much of the material is of a review nature and hence may be 
quickly glossed over by a reader with a good background in statistics. Nev¬ 
ertheless, there may be some material which may not have been seen before, 
and so this section should not be completely ignored. Subsequent sections 
will refer back to this material. 


iThe terms limited fluctuation and greatest accuracy go back at least as far as a 1943 paper 
by Arthur Bailey [7]. 















16.2.1 Conditional distributions 


fx{x) and fy(y), respectively. The conditional pf of X given that Y = y is 


fx\y(x\y )= 


If X and Y* are discrete random variables, then (16.1) is the conditional prob- 


axe independent random variables. 


fx,y( x ^y) = fx(x)fy(y), 


and in this case, (16.1) yields 


fx\v{x\y) = fx(x), 


Example 16.1 Suppose X and Z are independent Poisson random variables 
with means Ai and \ 2 , respectively. Let Y = X + Z. Demonstrate that 
X\Y = y is binomial with parameters m = y and q = Ai/(Ai + A 2 ) (see, for 
example, [58], p. 131). 


The conditional distribution of X given that Y = y is 


Vx{X = x,Y = y) 

~Pr(r = y)~~ 

Pr(X = x,Z = y — x) 

Pr(y = 2/) 

Pr(X = a:)Pr(Z = y-x) 
Pr(F = y) 

\le~^ Are -入 2 

a;! (y - x)! 

(Ax + Aafe-^-^ 


y\ ( Ax V ( A 2 ' 
xl(y-x)\ VAi + A 2 y VAi + A 2/ 















Note that the roles of X and Y in (16.2) can be interchanged, yielding 

fx\Y(x\y)f Y (y) = fy\x(.y\x)fx(x), 


Division by fy{y) yields Bayes* theorem, namely, 


fx\v(x\y)= 


_ fY\x(y\x)fx(x) 


16.2.2 Conditional expectation 

As in the previous subsection, assume that X and Y are two random variables 
and the conditional pf of X given that Y = y is fx\y(x\y). Clearly, this is a 
valid probability distribution, and its mean is denoted by 


E ( 寧 =y) 


fx\y(x\y) dx 


with the integral replaced by a sum in the discrete case. Clearly, (16.4) is a 
function of y, and it is often of interest to view this conditional expectation 
as a random variable obtained by replacing i/ by y in the right-hand side of 
(16.4). Thus we can write E(X|y) instead of the left-hand side of (16.4), and 
so E(X|F) is itself a random variable because it is a function of the random 
variable Y. The expectation of E(X|y) is given by 

E[E(X|F)] = E(X). (16.5) 

To see this, note that from (16.3) and (16.4) 


=J E(E|r = y)fy(y)dy 
=j j xfx\Y(x\y)dxf Y (y) dy 
=j x j fx\Y(x\y)f Y {.y)dydx 
=f xf x (x)dx 


with a similar proof in the discrete case. 

Example 16.2 Derive the mean of the negative binomial distribution by con¬ 
ditional expectation, recalling that ， f/X|© 〜 Poisson (㊀) andQ 〜 gamma{a,0), 
then X 〜 negative binomial with r = a and /3 — 


We have 


STATISTICAL CONCEPTS 521 


E(X\Q) = e 

and so 

E(X)=E[E(X|0)]=E(0). 

Prom Appendix A the mean of the gamma distribution of © is a/3, and so 
E(X) = ap. □ 

It is often convenient to replace X by an arbitrary function h(X,Y) in 
(16.4), yielding the more general definition 

Y)\Y = y] = J h(x, y)fx\y(x\y) dx. 

Similarly, E[/i(X,y)|y] is the conditional expectation viewed as a random 
variable whicli is a function of Y. Then, (16.5) generalizes to 

E{E[h(X, Y)\Y]} = E[h(X,Y)}. (16.6) 

To see (16.6), note that 

= J E[h{X,Y)\Y = y}f Y (y)dy 
= j j y)fx\y(x\y) dxf Y (y) dy 
=J J Hx, y)[fx\Y(x\y)fY(y)] dx dy 

= j j K^y)fx,r(x, y)dxdy 
=nh(X,Y)] 

from (16.2). 

If we choose h(X, Y) = [X - E(X\Y)] 2 ^ then its expected value, based on 
the conditional distribution of X given Y, is the variance of this conditional 
distribution, 

Vax (X\Y) = E{[X - E(X\Y)] 2 \Y}. (16.7) 

Clearly, (16.7) is still a function of the random variable Y. 

It is instructive now to analyze the variance of X where X and Y are two 
random variables. To begin, note that (16.7) may be written as 

Vax(X\Y) = E(X 2 \Y) - [E(X|y)] 2 . 

Thus, 

E[Vbi(X\Y)] = E{E(X 2 \Y) - [E(X|F)] 2 } 

= E[E(X 2 \Y)}~E{[E(X\Y)} 2 } 

= E(X 2 )-E{[E(X|y)] 2 }. 
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Also, because Var[/i(y)] = E{[/i(y)] 2 } — {E[/i(y)]} 2 , we may use h(Y) == 
E(X|y) to obtain 

Var[E(X|F)] = E{[E(X|F)] 2 } - {E[E(X|y)]} 2 
= E{[E(X|y)] 2 }-[E(X)] 2 . 

Thus, 

E[Vax(X|y)]+Var[E(X|F)] = E(X 2 ) - E{[E(X|y)] 2 } 

+E{[E(X|y)] 2 }-[E ⑷] 2 
=E(X 2 )-[E(X)] 2 
=Vax(Z). 

Thus, we have established the important formula 

Vax(X) = E[Var(X|y)] + Vax[E(X|y)]. (16.8) 

Formula (16.8) states that the variance of X is composed of the sum of two 
parts: the mean of the conditional variance plus the variance of the conditional 
mean. 

Example 16.3 Derive the variance of the negative binomial distribution. 

The Poisson distribution has equal mean and variance, that is, 

E(X|0) = Vax(X|Q) = 0, 

and so, from (16.8), 

Yax(X) =. E[Var(X\Q)] + Var[E(X|©)] 

=E(0)+Var(0). 

Because 0 itself has a gamma distribution with parameters a and 0, E(©)=; 

an d Vax(@) = ol/3' Thus the variance of the negative binomial distribution 
is 

Var(X) = E(0)+Var(0) 

=a/3 + a/? 2 

= 响 1 + 外 □: 

Example 16.4 It was shown in Example 4.30 that, if X\0 is normally dis¬ 
tributed with mean Q • and variance v where 0 is itself normally distributed 
with mean and variance a, thenX (unconditionally) is normally distributed 
with m ean p and variance a + v. Use (16.5) and (16.8) to obtain the mean 
and variance of X directly. 


For the mean we have 


E(X)= E[E(X|0)] = E(0) = 
and for the variance we obtain 

Vax(X) = E[Var(X|©)] + Vax[E(X|0)] 

=E(v) + Var (㊀ ） 

=v + a 

because v is a constant. □ 

Example 16.5 Consider a compound Poisson distribution with Poisson mean 
X 3 where X ==Yi-jr • Yn with E(if) = [i Y an ^ Vax^) = ay. Determine 
the mean and variance of X. 

Formula (16.8) was used in Chapter 6 to obtain the answers: 

E(X) = Xfi Y and Var(X) = A(/Zy + ay), 口 

16.2.3 Nonparametric unbiased estimators 

Unbiased estimation was covered in Section 9.2.2. It plays an important role 
in the development of credibility formulas. We begin by showing that two 
commonly used estimators are unbiased. 

Theorem 16.6 If X±,. •., X n are independent but not necessarily identically 
distributed with common mean fi = E(Xj) and common variance v = Var(Xj), 
then 

文 =於 

is an unbiased estimator of fj, and 

(16-9) 

is an unbiased estimator of v. 

Proof: For X, we have 

e w =e (^E^)=^E e (^)=m. 

\ i=i I j=i 
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For the variance estimator, begin with the following result, which, will be used 
later: 




= y^x x j —m) 2 + 2 —m)(m— 旬 + - 旬 2 

j=i i=i i=i 

= - fi) 2 + 2(fi - X) y^(Xj 一 〆 ） + n ( 弘一尤 )2 
j=i i=i 

= ^2( x j 一 M) 2 + 2(/x — 尤 )n(J? — /x) + n(p — X) 2 

i=i 

= ■广 M ) 2 一 n (尤一 M ) 2 . (16.1 


We also have (from the independence of Xi, … ， X n ), 


Vax(X) = Var 

=iEVarCX,) 


Take expectations in (16.10) to obtain 


E _( X 广叫 == E _ CX ■厂叫 — nE [(尤 一 M ) 2 ] 

= |>[(LnVax(X) 


=E V ^)-^ 


Show that all three estimators are unbiased for fi and then rank them in order 
by mean-squared error. Also obtain the expected value of a sum of squares 
that may be useful for estimating a and (3. 


First consider X. 


E(X) = mr 1 J2 m M x i) =m~ 1 J2i 
i=i i=i 

n 

Var(X) = rrT 2 ^ Var(Jgj) 


=m_2 S 命 ‘ 


+ )0m- 
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The estimator ^ is the one defined in Theorem 16.6 and has already been 
shown to be unbiased. We also have 

Var^) = rT 2 Vax(Xj) 


With regard to 


冷 n - 1 + n - 2 a ^ mj 1 . 
i=i 


: 寧 i) 


We now consider the relative ranking of these variances (because all three 
estimators are unbiased, their mean-squared errors equal their variances, so 
it is sufficient to rank the variances). To show that it is not possible to order 
and Var(X), examine their difference: 


Vax(X)-Vax(A!) = a 


士 4 1 m - 2 公 m! 


The coefficient of /3 must be no 皿 egative. To see this, note that 


It-nV-t 





2cii Var(«X"i) + A ， 


l setting it equal to zero gives = — A[2 Var(Xi)]— 1 . In other 
weights should be proportional to the reciprocal of the variance, 
precisely the weights used in On, and therefore it must have the f 
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乞饥级 -X) 2 = n + fi- X) 2 


=- n) 2 + 2Y^m j {X j -/j,)(fi.-X) 


+ 53爪办一尤) 2 
i=i 

= 53 - M) 2 + 2 (M -X)^ rrij(Xj - jj.) 

i=i i=i 

+m(fi - X) 2 

= m,j(Xj — fi) 2 + 2(/x — X)m(X 一 /x) + Tn(fi — X) 2 


〉 ^ 771 j {Xj — /i) 2 — 7Tl{X — Jl) 2 . 


Taking expectations yields 


E mj(Xj - X) 2 = ^2 m 3^l( X 3 m) 2 ] - mE[(X - jj) 2 ] 


=y^mj Var(_Xj) - mVar(X) 
i=i 

=IHw)-+ _1 § m ( 


E lY^rrijiXj - X) 2 = /3 | m - m"" 1 I +a(n-l). (16.12) 


In addition to being of interest in its own right, (16.12) provides an unbiased 


16.1 Suppose X is binomially distributed with parameters n\ and p, that is, 

fx{x)=(^jp x (l-p) ni ~ x , x = 0,l,2,..., ni . 

Suppose also that Z is binomially distributed with parameters 712 and p inde¬ 
pendently of X. Then Y = X Z is binomially distributed with parameters 
m +n 2 and p. Find the conditional distribution of X given that Y = y. 

16.2 Let X and Y have joint probability distribution as follows: 


X 


y 


0 

1 

2 

0 

0.20 

0 

0.10 

1 

0 

0.15 

0.25 

2 

0.05 

0.15 

0.10 


(a) Compute the marginal distributions of X and Y. 

(b) Compute the conditional distribution of X given Y = ?/ for ?/ == 

0 , 1 , 2 . 

(c) Compute E(X\y) i E(X 2 \y) 7 and Var(X|?/) for y = 0,1 ， 2. 

(d) Compute E(X) and Var(X) using (16.5), (16.8), and (c). 
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E(X\Y = y) = „ 1 + p^{y- fi2 ). 


(b) The marginal pdf is 


fx{x) = v^ exp \ 


- K ^) 5 


(c) The variables X and Y are independent if and only if p = 0. 

16.4 Suppose that the random variables Yi ， … ,Y n are independent with 
E(l}) = 7 and Var(l}) = aj -f ct 2 / 6 j, j = 1,2, 


Define 6 = -f 6 2 + * * * + &n and Y : 




E ={n-l)o 2 ^Y^a j [& j 


16.5 Suppose that given © = (㊀ i ， ㊀ 2 ) the random variable X is normally 
distributed with mean 0 i and variance @ 2 . 

(a) Show that E(X) = E(0i) and Vax(X) = E(© 2 ) + Var (0i) • 

(b) If 0i and 02 axe independent, show that X has the same distribu¬ 
tion as ©x+y, where 0i and Y are independent and Y conditional 
on ©2 is normally distributed with mean 0 and variance © 2 . 

16.6 Suppose that 0 has pdf tt(0), 0 > 0 , and 0i has pdf 7Ti(0) =Tt(d-a), 6 
> a > 0. K, given © 1 , X is Poisson distributed with mean ㊀ 1 ， show that X 




16.3 LIMITED FLUCTUATION CREDIBILITY THEORY 

This branch of credibility theory represents the first attempt to quantify the 
credibility problem. This approach was suggested in the early part of the 


—— 








year;, or me average amount 01 losses per vemcie ior a neet 01 aenvery 
cks owned by a food wholesaler. 

Ne first present one approach to decide whether to assign full credibility 
sirge X), and then we present an approach to assign partial credibility if it 
3lt that full credibility is inappropriate. 





































16.3.1 Full credibility 


One method of quantifying the stability of X is to infer that X is stable if 
the difference between X and ^ is small relative to ^ with high probability. 
In statistical terms, this means that we should select two numbers r > 0 and 
0 < p < 1 (with r close to 0 and p close to 1, common choices being r = 0.05 
and p = 0.9) and assign full credibility if 

Pr(-r^ <X-^<rO>P- (16.13) 

It is convenient to restate (16.13) as 




Now let y p be defined by 


That is, y p is the smallest value of y which satisfies the probability statement 


丄 Jien the condition tor full credibility is r^n/a > y p , 


(m6) 

where Ao = (v p /t) 2 . Condition (16.16) states that full credibility is assigned 
if the coefficient of variation a/^ is no larger than y/n/Xo, an intuitively 
reasonable result. 

Also of interest is that (16.16) can be rewritten to show that full credibility 
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i)l{(rl\/n) has a standard normal distribution. Then (16.15) becomes (where 
Z has a standard normal distribution and $(y) is its cdf) 

V = Pr(|Z| < y p ) 

=Pr(-2/ P <Z<y p ) 

=^(Vp) - H-y P ) 

= 电 { yp ) 一 1 + 电、 y p ) 

= 2$(y p ) — 1. 

Therefore $(y p ) = (1 +p)/2 and therefore y p is the (1 +p)/2 percentile of the 
standard normal distribution. 

For example, if p = 0.9, then standard normal tables give 2 / 0.9 = 1.645. 
If ， in addition, r = 0.05, then Ao = (32.9) 2 = 1,082.41 and (16.18) yields 
n > l,082.41o- 2 /^ 2 . Note that this answer assumes we know the coefficient 
of variation of Xj, It is possible we have some idea of its value, even though 
we do not know the value of ^ (remember, that is the quantity we want to 
estimate). 

The important thing to note when using (16.18) is that the coefficient of 
variation is for the estimator of the quantity to be estimated. The right- 
haad side gives the standard for full credibility when measuring it in terms of 
exposure units. If some other unit is desired, it is usually sufficient to multiply 
both sides by an appropriate quantity. Finally, any unknown quantities will 



common cases. 


Example 16.8 Suppose past losses Xi” •. ， X n are available for a particular 
policyholder. The sample mean is to be used to estimate ^ = E(Xj). Deter¬ 
mine the standard for full credibility. Then suppose there were 10 observations 
with 6 being zero and the others being 253, 398, 439, and 756. Determine the 
fall-credibility standard for this case with r = 0.05 and p = 0.9. 

The solution is available directly from (16.18) as 


n > A 0 



For this specific case, the mean and standard deviation can be estimated from 
the data as 184.6 and 267.89 (where the variance estimate is the unbiased 
version using n — 1). With Ao = 1082.41, the standard is 


n> 1082.41 



= 2279.51 
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In the next example it is farther assumed that the observations axe from a 
particular type of distribution. 

Example 16.9 Suppose that past losses Xi . X„ are available for a vartic- 



I 





wnue it appears tiiat no estimation is needed tor this standard，it is in terms 
of the expected number of claims needed. In practice, the standard is set in 
terms of the actual number of claims experienced, eflfectively replacing tiA on 
the left by its estimate iVi + … + N n . 

For the given data, there were 5 claims, for an estimate of A of 0.5 per 
policy. The standard is then 

n >l^ll =2il64 . 82 

and the 10 policies axe fax short of this standard. Or the 5 actual claims could 
be compared to A 0 = 1,082.41, which leads to the same result. 

Case 2: When accuracy is with regard to the average total payment, we have 
《 =E(J?j) = X9y and Vax(5^) = X(0y + cry), formulas developed in Chapter 
6. In terms of the sample size, the standard is 


n > Ao M%+^) = Ao[ 1 + ^ 2 ' 


入(碎 + a 冬）—入 0 
a i 0|~ = T 



















lese examples, une bXiauuaj-u iui illll uicuiuullu^ u.uu jjllcu aoiu. okj 
means are not sufficiently accurate to be used as estimates of the 
id value. We need a method for dealing with this situation. 


L6.3.2 Partial credibility 

fit is decided that full credibility is inappropriate, then for competitive rea- 
ions (or otherwise) it may be desirable to reflect the past experience X in the 
iet premium as well as the externally obtained mean, M. An intuitively ap- 
jealing method for doing this is through a weighted average, that is, through 
;he credibility premium 

P c = ZX + {l-Z)M, (16.19) 

vhere the credibility factor Z € [0,1] needs to be chosen. There are many 
brmulas for Z which have been suggested in the actuarial literature, usually 
lustified on intuitive rather than theoretical grounds. (We remark that Mow- 
〕 rav 【961 considered full, but not partial credibility.) One important choice 



















where k needs to be determined. This particular choice will be shown to be 
theoretically justified on the basis of a statistical model to be presented in the 
next section. Another choice, based on the same idea as fall credibility (and 
including the full-credibility case Z = 1), will now be discussed. 

A variety of arguments have been used for developing the value of Z, many 
of which lead to the same answer. All of them are flawed in one way or 
another. The development we have chosen to present is also flawed but is 
at least simple. Recall that the goal of the full-credibility standard was to 
ensure that the difference between the net premium we are considering (X) 
and what we should be using ⑹ is small with high probability. Because X 
is unbiased, this is essentially (and exactly if X has the normal distribution) 




credibility premium, P c , as follows: 

£ = Var(P c ) 

= Vai[ZX + (l~Z)M] 

=Z 2 Vax(X) 

= Z 2 — . 
n 

Thus Z = (^/crJ^/n/Ao, provided it is less than 1. This can be written 
using the single formula 


One interpretation of (16.21) is that the credibility factor Z is the ratio of 
the coe 伍 cient of variation required for full credibility (-^/n/Ao) to the actual 
coefficient of variation. For obvious reasons this is often called the square root 
rule for partial credibility. 

While we could do the algebra with regard to (16.21), it is sufficient to note 
that it always turns out that Z is the square root of the ratio of the actual 
count to the count required for full credibility. 

Example 16.10 Suppose in Example 16.8 that the manual premium M is 
225. Determine the credibility estimate. 

The average of the payments is 184.6. With the square root rule the cred¬ 
ibility factor is 

Z = i d = 0.06623. 


V 2,279.51 


Then the credibility premium : 


P c = 0.06623(184.6) + 0.93377(225) = 222.32. 


□ 
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Example 16.11 Suppose in Example 16.9 that the manual premium M is 
225. Determine the credibility estimate using both cases. 

For the first case, the credibility factor is 


and applying it yields 

P c = 0.06797(184.6) + 0.93203(225) = 222.25. 

At first glance this may appear inappropriate. The standard was set in terms 
of estimating the frequency but was applied to the aggregate claims. Often, 
individuals are distinguished more by differences in the frequency with which 
they have claims rather than by differences in the cost per claim. So this 
factor captures the most important feature. 

For the second case, we can use any of the three calculations: 


P c = 0.06048(184.6) + 0.93952(225) = 222.56. ^ 

Earlier we mentioned a flaw in the approach. Other than assmning that the 
variance captures the variability of in the right way, all of the mathematics 
is correct. The flaw is in the goal. Unlike X, P c is not an unbiased estimator 
of In fact, one of the qualities that allows credibility to work is its use 
of biased estimators. But that means that the appropriate measure of the 


in the next subsection, this is not only a problem with our determination of 
Z, it is a problem that is characteristic of the limited fluctuation approach. 
A model for this relationship is introduced in the next section. 

This section closes with a few additional examples. In each of the first two 
examples Aq = 1,082.41 is used. 


Example 16.12 For group dental insurance，historical experience on many 
groups has revealed that annual losses per life insured have a mean of 175 
and a standard deviation of 140. A particular group has been covered for two 
years with 100 lives insured in year 1 and 110 in year 2 and has experienced 
average claims of 150 over that period. Determine if full or partial credibility 
is appropriate，and determine the credibility premium for next year’s losses if 
there will be 125 lives insured. 
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We will apply the credibility on a per-life-insured basis. We have observed 
100+110 = 210 exposure units (assume experience is independent for different 
lives and years), and X = 150. Now M = 175 and we assume that a will be 
140 for this group. Because we are trying to estimate the average cost per 
person, the calculations done in Example 16.11 for Case 2 apply. Thus, with 
n = 210 and Ao = 1,082.41 we estimate By with the sample mean of 150 to 
obtain the standard for full credibility as 


-a 2 - 


and then calculate 


“ V 942.90 — 

(note that X is the average of 210 claims, so approximate normality is assumed 
by the central limit theorem). Thus, the net premium per life insured is 

Pc = 0.472(150) + 0.528(175) = 163.2. 

The net premium for the whole group is 125(163.2) = 20,400. □ 

Example 16.13 An insurance coverage involves credibility based on number 
of claims only. For a particular group ，715 claims have been observed. Deter¬ 
mine an appropriate credibility factor, assuming that the number of claims is 
Poisson distributed. 

This is Case 1 from Example 16.11 and the standard for full credibility 
with regard to the number of claims is nA > Aq = 1082.41. Then 


Example 16.14 Past data on a particular group areX = (Xi,X 2 , •. • ， X n ) r , 
where the Xj are independent and identically distributed compound Poisson 
random variables with exponentially distributed claim sizes. If the credibility 
factor based on claim numbers is 0.8, determine the appropriate credibility 
factor based on total claims. 

When based on Poisson claim numbers, from Example 16.9, Z = 0.8 implies 
that An/Ao = (0.8) 2 = 0.64, where Xn is the observed number of claims. For 
exponentially distributed claim sizes ay = 0y- From Case 2 of Example 16.9, 
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Problems with the approach 

the limited fluctuation approach yields simple solutions to the problem, 
there are theoretical difficulties. First, there is no underlying theoretical model 
for the distribution of the XjS and thus no reason why a premitno. of the form, 
(16.19) is appropriate and preferable to M. Why not just estimate ^ from a 
collection of homogeneous policyholders and charge all policyholders the same 
rate? While there is a practical reason for using (16.19), no model has been 
presented to suggest that this may be appropriate. Consequently, the choice 
of Z (and hence P c ) is completely arbitrary. 

Second, even if (16.19) were appropriate for a particular model, there is no 
guidance for the selection of r and p. 

Finally, the limited fluctuation approach, does not examine the difference 
between ^ and M. When (16.19) is employed, we axe essentially stating that 




5 . Furthermore, the intuitively appealing formula (16.19) is a conse- 
of this approach, and Z is often obtained from relations of the form 


16.3.4 Notes and References 


•The limited fluctuation approach is discussed by Herzog [52] and Longley- 
Cook [87]. See also Norberg [100]. 


16.7 An insurance company has decided to establish its full-credibility re¬ 
quirements for an individual state rate filing. The full-credibility standard 
is to be set so that the observed total amount of claims underlying the rate 
filing would be within 5% of the true value with probability 0.95. The claim 
frequency follows a Poisson distribution and the severity distribution has pdf 


Determine the expected number of claims necessary to obtain full credibility 
using the normal approximation. 









credibility is appropriate and determine the net premium for next year’s claims 
assuming the normal approximation. Use r = 0.05 and p = 0.9. 

16.10 Redo Example 16.10 assuming that Xj is a compound negative bino¬ 
mial distribution rather than compound Poisson. 

16.11 (*) The total number of claims for a group of insureds is Poisson with 
mean A. Determine the value of A such that the observed number of claims will 
be within 3% of A with a probability of 0.975 using the normal approximation. 

16.12 (*) An insurance company is revising rates based on old data. The 
expected number of claims for full credibility is selected so that observed 
total claims will be within 5% of the true value 90% of the time. Individual 
claim amounts have pdf f(x) = 1/200,000, 0 < x < 200,000, and the number 
of claims has the Poisson distribution. The recent experience consists of 1,082 
claims. Determine the credibility, Z, to be assigned to the recent experience. 
Use the normal approximation. 

16.13 (*) The average claim size for a group of insureds is 1,500 with a 
standard deviation of 7,500. Assume that claim counts have the Poisson 
distribution. Determine the expected mimber of claims so that the total loss 
will be within 6% of the expected total loss with probability 0.90. 
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16.14 (*) A group of insureds had 6,000 claims and a total loss of 15,600,000 
Ihe prior estimate of the total loss was 16,500,000. Determine the limited fluc- 
=uatic^i credibility estimate of the total loss for the group. Use the standard 
for full credibility determined in Exercise 16.13. 


-v y -- ootMiucuu la sen so tnat tne total number of 

^ within 5% of the true value with probability p. This standard is 
800 claims. The standard is then altered so that the total cost of claims is 
* 0be within 10% of the true value with probability p. The claim frequency 
^^L/ 01SS0n 跑 11131 ^ 011 _ the claim severity distribution has pdf f(x )= 
0.0002(100 - a;)，0 < x < 100. Determine the expected number of daims 
necessary to obtain full credibility under the new standard. 


—■V y - 丄 creoiDmry oi 丄， UUU claims has been selected 

so that the actual pure premium will be within 10% of the expected pure 
premium 95% of the time. The number of claims has the Poisson distribution. 
Determine the coefficient of variation of the sRvf*rit.v 


3. The observed number of claims is 10,000. 

4. The number of claims required for full credibility is 17,500. 

Detenninethe credibility estimate of the group's expected total losses based 
upon all the above information. Use the credibility factor that is appropriate 
•it tiie goal is to estimate the expected number of losses. 


16.18 ( ) A ftdl-CTedibility standard is determined so that the total number of 
claims is within 5% of the expected number with probability 98%. Hthe same 
expected number of claims for full credibility is applied to the total cost of 
tje actual total cost would be within 100 凡％ of the expected cost with 
9570 Probability. Individual claims have severity pdf f(x) = 2.5x~ 3 - 5 x> 1 
and the number of claims has the Poisson distribution. Determine if' 


16.19 ( ) The number of claims has the Poisson distribution. The mimber of 
clajms and the claim severity are independent. Individual claim amounts can 
be for 1, 2, or 10 with probabilities 0_5, 0.3, and 0.2, respectively. Determine 
inL eX ? G ? ted numberof claims needed so that the total cost of claims is within 
10% of the expected cost with 90% probability. 


16.20 (*) The number of claims has the Poisson distribution. The coefficient 
of variation of the severity distribution is 2. The standard for fuU credibility 
m estimating total claims is 3,415. With this standard the observed pure 






emium will be within k% of the expected pure premium 95% of the* time. 
3termine fe, 

5.21 (*) You axe given the following: 

1. P = Prior estimate of pure premium for a particular class of business. 

2. O = Observed pure premium during the latest experience period for the 
same class of business. 


R = Revised estimate of pure premium for the same class following the 


observations. 


:. F = Number of claims required for full credibility of the pure premium. 
Express the observed number of claims as a function of these four items. 
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LI Introduction 

；his and the following section，we consider a model-based approach to 
solution of the credibility problem. This approach, referred to as great- 


rate for the policyholder? 

To proceed, let us assume that the risk level of each policyholder in the 
rating class may be characterized by 汪 risk parameter 0 (possibly vector val¬ 
ued), but the value of 6 varies by policyholder. This allows us to quantify 
the differences between policyholders with respect to the risk characteristics. 
Because all observable underwriting characteristics have already been used, 
0 may be viewed as representative of the residual, unobserved factors which 
affect the risk level. Consequently, we shall assume the existence of 0, but we 
shall further assume that it is not observable and that we can never know its 
true value. 

Because 6 varies by policyholder, there is a probability distribution with pf 
丌 (0) of these values across the rating class ‘ Thus, if 0 is a scalar parameter, the 
cumulative distribution function 11(0) may be interpreted as the proportion 
of policyholders in the rating class with risk parameter 0 less than or equal 
to 0. [In statistical terms, 0 is a random variable with distribution function 
U(0) = Pr(0 < 6 )] Stated another way, U ⑹ represents the probability that 
a policyholder picked at random from the rating class has a risk parameter 
































with probabilities 0.5, 0.3, and 0.2, respectively. Describe this process and how 
it relates to an unknown risk parameter. 


When a driver buys our insu 
good or bad driver. So the risk 
can set 0 = G for good drivers 


I 


Example 16.17 The amount of a claim has the exponential distribution with 
mean 1/0. Among the class of insureds and potential insureds, the parameter ：： 
© varies according to the gamma distribution with a = 4 and scale pa/rameter 
/3 = 0.001. Provide a mathematical description of this model 

For claims， 

fx\e{x\0) = 6e~ e \ x,Q>Q, 
and for the risk parameter, 

, fl 、 e^-^^l.OOO 4 n 

^ - g - > 0 > 0. q:) 
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16.4.2 The Bayesian methodology 

Continue to assume that the distribution of the risk characteristics in the 
population may be represented by 7r(0), and the experience of a paxticulax 
policyholder with risk parameter 6 arises from the conditional distribution 
Jx\e{x\0) of claims or losses given 9. 

We ^ ow return to the problem introduced in Section 16.3. That is, for a 
particular policyholder, we have observed X = x, where X = (不，…， X n ) T 
and x = (xi,..., x n ) T , and axe interested in setting a rate to cover X n L. We 
assume that the risk parameter associated with the policyholder is 0 (which 
is unsown). Furthermore, the experience of the policyholder corresponding 
to d^erent exposure periods is assumed to be independent. In statistical 


with the risk process. 

Here we repeat the development in Section 12.4, noting that if 0 has a 
r^ crete distribution the integrals axe replaced by sums. Because the Xs are 
independent conditional on © = 0, we have 

n 

/x,©(x, 6) = /(xi,.. ^x n \0)Tv{0) = JJ fxj\e(^j\^) 

i=i J 

joint distribution of X is thus the marginal distribution obtained by 
integrating B out, that is, 


/x(x) === / JJ fx j \e{x j \6)\Tx{8)d6. 


^ of is the right-hand side of 

(lb.22) With n replaced by n + 1 in the product. Finally, the conditional 
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density of X n+ i given X = x is the joint density of (Xi,...,X n+ i) divided 
by that of X, namely, 

/x n+1 |x(a ； n+i|x) = J n /x.iefe-l^) ir(e)de. (16.23) 

There is a hidden mathematical structure underlying (16.23) which may 
often be exploited. The posterior density of 0 适 ven X is 

7re|x(<9|x) = = (16-24) 

In other words ， [n^i / 雕 ㈣ 叫咖 ) = 丌 e|x(%)/x(x)，and substitution 
in the numerator of (16.23) yields 

/xn+ilx^n+iW = J /x n+1 |e(a；n+i|0)7r©|x(0|x) de. (16.25) 

Equation (16.25) provides the additional insight that the conditional distri¬ 
bution of X n ^.i given X may be viewed as a mixture distribution, with the 
mixing distribution the posterior distribution 7r©| X (0|x). 

The posterior distribution combines and summarizes the information about 
0 contained in the prior distribution and the likelihood and consequently 
(16.25) reflects this information- As noted in Theorem 12.49, the posterior 
distribution admits a convenient form when the likelihood is derived from the 
linear exponential family and 7t(0) is the natural conjugate prior. This pro¬ 
vides an. easy method to evaluate the conditional distribution of X n ^.\ given 
X in these cases. 

Example 16.18 (Example 16.16 continued) For a particular policyholder 
suppose we have observed : ci = 0 and X 2 = 1. Determine the predictive 
distribution of Xs\Xi == 0,X 2 = 1 and the posterior distribution of Q\Xi = 
0,Z 2 = 1. 

Prom (16.22), the marginal probability is 

/x(0,l) = 'Zfx^em^em^) 

0 

= 0.7(0.2)(0.75) + 0.5(0.3)(0.25) 

= 0.1425. 

Similarly, the joint probability of all three variables is 

/x,x 3 (0,1 ，怎 3 ) = fcieCTW/xaie ( 聯 / x 3 |eQ^3l g M g ). 


Thus, 
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/x,x 3 (0,1,0) = 0.7(0.2)(0.7)(0.75) + 0.5(0.3) (0.5) (0.25) = 0.09225, 
/x,x 3 (0,1,1) = 0.7(0.2)(0.2)(0.75) + 0.5(0.3)(0.3)(0.25) = 0.03225, 
/x ， x 3 (0, 1 ， 2) = 0.7(0.2)(0.1)(0.75) + 0.5(0.3)(0.2)(0.25) = 0.01800. 

The predictive distribution is then 


/x 3 |x(0|0 ,l)= 
/ X3 ,x(l|0,l)= 
/ X 3ix(2|0,l)= 


The posterior probabilities are, from (16.24), 

， rln ^ /(0|G)/(1 |G)tt(G) 0.7(0.2)(0.75) 

冗闷 10 ， 1 ) —— 7m —— •—0lii5 

,、 f(0\B)mB)7T(B) 0.5(0.3)(0.25) 

抛 1 ) —- Mr )— = .0 ： 1425" 


From this point forward the subscripts on f and ?r will be dropped unless 
needed for clarity. The predictive probabilities could also have been obtained 
using (16.25). This method is often simpler from a computation viewpoint. 

/_ ， 1) = X^/(0IWI0,1) 

6 

= 0.7(0.736842) + 0.5(0.263158) = 0.647368, 

/(1|0,1) = 0.2(0.736842) + 0.3(0.263158) = 0.226316, 

/(2|0,1) = 0.1(0.736842) + 0.2(0.263158) = 0.126316, 

which matches the previous calculations. □ 

Example 16.19 (Example 16.17 continued) Suppose a person had claims of 
100, 950, and 450. Determine the predictive distribution of the fourth claim 
and the posterior distribution of Q • 

The marginal density at the observed values is 


/(100,950,450) 


1 ， 000 4 720 
6 2,500 7 _ 
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/(100,950,450 } a ； 4) 


知一 100 %e 一 950 %e— 4 咖 0 -〜土 咖必 
6 


0 7 e ~(2,5OO+x 4 )^ 0 


1 ， 000 4 5,040 

6 (2,500+ a; 4 ) 8 * 


Then the predictive density is 


/(x 4 |100 ) 950,450) = 


1,000 4 5,040 

6 (2,500+ x 4 ) 8 

~' 1 ， 000 4 720 ~ 
6 2,500 7 


7(2,500) 7 
(2,500+ x 4 ) 8 


which is a Pareto density with, parameters 7 and 2,500. 

For the posterior distribution we take a shortcut. The de: 
integral that produces a number and can be ignored for now. 
can be written 

^1100,950,450) oc 0 e — 1 咖 0 e — 9 咖 0 e — 4 咖 1^!沪 6 




recognize this function as that of a gamma distribution with parameters 7 and 
1/2,500. Therefore, 


7r((9|100,950,450) 


0 6 e_ 2 , 刪 2,500 7 

~~r(7) . 


Then the predictive density can be alternatively calculated from 


/(rc 4 |100,950,450)= 


0 6 e -2,5OO0 2j 5OO 7 


0 7 e - (2,5 OO + a ; 4 )0 d0 


6! (2,500+ x 4 ) 8 
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To return to the original problem, we have observed X = x for a particular 
policyholder and we wish to predict X n +i (or its mean). An obvious choice 
would be the hypothetical mean (or individual premium) 

t^n+M = E(X n+ i|0 = ey= J x n+1 f Xn+lle (x n+1 \6) dx n+1 (16.26) 

if we knew 6 . Note that replacement of 0 by 0 in (16.26) yields, upon taking 
the expectation, 

Mn+i = E(X n+i ) = E[E(X n+1 |©)] = E[/i n+1 (0)] 






can do is try to use the data. This suggests the use of the Bayesian premium 
(the mean of the predictive distribution) 


E(X n+1 |X = x) = 


1 |x(^n+i|x)da; n+ i. 


A computationally more convenient form is 

E (-Xn+i |X = x) = / /x n+1 (0)7T©|x(^|x) d8, (16.28) 

In other words, the Bayesian premium is the expected value of the hypothetical 
means，with expectation taken over the posterior distribution 7r©| X (0|x). We 
remind the reader that the integrals are replaced by sums in the discrete case. 
To prove (16.28), we see from (16.25) that 

E ( x n+i |X = x) = J x n+1 f Xn+1 | X (x n+1 |x) dx n+1 

= J Xn +1 [/ /A：n +1 \e(Xn+l |0)7T©| X (0|x)d0j dx n+ i 
= / [/ Xn +l/^n+i|©( a ： n+l|^)^n+lj 7T0|x(^|x) d6 


= / " n +1 (咖 epc 刚肌 










Example 16.20 (Example 16.18 continued) Determine the Bayesian pre¬ 
mium using both (16.27) and (16.28). 

The (unobservable) hypothetical means are 

Ai 3 (G) = (0)(0.7) +1(0.2)+ 2(0.1) = 0.4 ， 

^ 3 (B) = (0)(0.5) + 1(0.3) + 2(0.2) = 0.7. 

If, as in Example 16.18, we have observed Xi = 0 and X 2 = 1, we have the 
Bayesian premiums obtained directly from (16.27): 

E(X 3 |0 } 1) = 0(0.647368) + 1(0.226316) + 2(0.126316) = 0.478948. 

The (unconditional) pure pre mium is 

柯 =E (Z 3 ) = M3 ( 沒 = (0.4) (0.75) + (0.7) (0.25) = 0.475. 

e 

To verify (16.28) with = 0 and = 1, we have the posterior distribution 
tt( 0|O, 1) from Example 16.18. 

Thus, (16.28). yields 

E(X 3 |0,1)= 0.4(0.736842) + 0.7(0.263158) = 0.478947 

with the difference due to roimding. In general, the latter approach utilizing 
(16.28) is simpler than the direct approach using the conditional distrlbutioii ： 
o£X n+ i|X = x. 口 

As expected, the revised value based on two observations is between tHe 
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Note that the estimate is a weighted average of the observed values and the 
unconditional mean. This formula is of the credibility weighted type (16.19). □ 


Here is an example where the random variables do not have identical dis¬ 
tributions. 

Example 16.23 Suppose that the number of claims Nj in year j for a group 
policyholder with (unknown) risk parameter 0 and rrij individuals in the group 
is Poisson distributed with mean TTijO, that is，for j = 1,..., n, 


FiiNj = x\e = 6) 


{rrijOYe-^ 0 

， 


This would be the case if, per individual, the number of claims were inde¬ 
pendently Poisson distributed with mean 6, Determine the Bayesian expected 
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With these assumptions, the average number of claims per individual in 
yeax j is 

Xj = —, j = 1,... ,n. 

TTlj 

Therefore, 

= Pr[JV> = rrijXjie = S\. 

Assume 0 is gamma distributed with parameters a and /3, 

ga-l -0/^ 

调 = T(a)l3 a ’ e>0 ' , 

then the posterior distribution 7r e | X (0|x) is proportional (as a function of 6) 
to . 




which is itself proportional to 


t - m 3 e 0 a_ * 1 e~ 0/ ^ = I 


l mj Xj —i^~e (/3 -1 +E J=i m i)' # 


This is proportional to a gamma density with parameters = a+X^=i rrijXj 

and /3* = (1//3 + Y^j=i W) 一 \ and so ©|X is also gamma，but with a and 心 
replaced by a* and respectively. 

Now, 

E(jg ㊀ = e) = E (—Njie = e)= ^(^\& = e) = e. 

Thus (I n+1 {9) = E(X„+i|© = 0) = 0 and ^ n+1 = E{X n+1 ) = E[M n+ i (©)]=： 
because 0 is gamma distributed with parameters a and p. From (16.28) 
and because 0|X is also gamma distributed with parameters a* and /?*， 


E(X n+ i|X = x) = j Mn+iW^epc ⑼ x ) 必 

=E[Mn + i(9)|X = x] 

=E(0|X = x) 

= % 圮 . 

Define the total number of lives observed to be m = m r 
Then ， 

E(X n+ i |X = x) = Zx + (l- Z)fi n+1 , 

where Z = m/(m + 0 — 1 ) and x = m" 1 E^=i m j x ji and ^ = o^/3, again an 
expression of the form (16.19). 


the amount of past data observed, the closer Z is to 1, consistent with our 
intuition. 

16.4.3 The credibility premium 

In the previous section a systematic approach was ; suggested for treatment 
of the past data of a particular policyholder. Ideally, rather than the pure 
premium fi n+ i = E(X n +i)，one would like to charge the individual premium 
(or hypothetical mean) /i n+1 (0), where 9 is the (hypothetical) parameter as¬ 
sociated with the policyholder. Because 6 is unknown, this is impossible, but 
we could, instead condition on x, the past data from the policyholder. This 
leads to the Bayesian premium Epf n +i|x). 

The major challenge with this approach is that it may be difficult to eval¬ 
uate the Bayesian premium. Of course, in simple examples such as in the 
previous subsection, the Bayesian premium is not difficult to evaluate numer¬ 
ically. But these examples can hardly be expected to capture the essential 
features of a realistic insurance scenario. More realistic models may well in¬ 
troduce analytic difficulties with respect to evaluation of E(X n +i|x), whether 
one uses (16.27) or (16.28). Often, numerical integration may be required. 
There are exceptions such as Examples 16.22 and 16.23. 

We now present an alternative suggested by Biihlmann [18] in 1967. Recall 
the basic problem: We wish to use the conditional distribution fx n+1 |©(^n+i|^) 
or the hypothetical mean. Mn+i ( 沒 ) f° r estimation of next year’s claims. Be¬ 
cause we have observed x, one suggestion is to approximate Mn+i (®) by a linear 
function of the past data. [After ail, the formula ZX + (1 — Z)fj, is of this 
form.] Thus, let us restrict ourselves to estimators of the form ao+2j=i oijXj, 
where ao, ai,’ a n need to be chosen. To this end, we will choose the as to 







minimize squared error loss, that is, 


Q = U„ + i(0)-«o-E a i x i 


and the expectation is over the joint distribution of Xi,X n and 0. That 
is, the squared error is averaged over all possible values of © and all possible 
observations. To minimize Q, we take derivatives. Thus, 

= E | 2 Mn+l(©) ~ a 0 a 3 X J ( -1 ) I - 

We shall denote by ao, • • • ， a n the values of ao, ai,... } o： n which minimize 
(16.29). Then equating dQ/dao to 0 yields 

n 

E[m„ +1 ⑹] + (叉 i). 

i=i 

But E(X n+ i) = E[E(Z n+1 |0)] =E[/Li n+1 (0)], and so dQ/da 0 = 0 implies that 

E(X n+1 ) = a 0 + J2 a^Xj). (16.30) 

i=i 

Equation (16.30) may be termed the unbiasedness equation because it: 
requires that the estimate &o + &jXj be unbiased for E(X n +i). How- 
ever, the credibility estimate may be biased as an estimator of /x n+1 (0)= 
E(X n +i|0), the quantity we axe trying to estimate. This bias will average 
out over the members of 0. By accepting this bias we are able to reduce the 
overall mean-squared error. For i = 1,..., n, we have 

黑 =E ^2 L n+1 (0) -ap-y^ otjx\ (-Xf)l 


and setting this equal to 0 yields 

EK +1 (©)^] = a 0 E (Xi) 4 - &jE (XiXj) • 


The left-hand side of this equation may be reexpressed as 
EK + i(©)^i] = E{E[X iMn+1 (0)|0]} 

= n^ + x(Q)mim 

= 丑附„ +1 |0雖收)] 
- E[E(X n+1 Xi\e)} 

= E(M n+ i), 
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where the second from the last step follows by independence of Xi anc 
conditional on 0. Thus dQ/dai = 0 implies 

E(XiX n+ i) = aoECJfi) + Y^a j E(X i X j ). 

3=1 

Next multiply (16.30) by E(Xi) and subtract from (16.31) to obtain 

Cov(Xi, X n+1 ) = J2aj Cov^XuXj), i = l,...,n. 
i=i 

Equation (16.30) and the n equations (16.32) together axe called the n 
equations. These equations may be solved for a 0 ,Q：i,...,Q；n to yi< 
credibility premium 


While it is straightforward to express the solution a 0} o：i,... ,a n to thei 
equations in matrix notation (if the covariance matrix of the i 
singular), we shall be content with solutions for some special cases. 

Note that exactly one of the terms on the right-hand side of (16.32) is a 
variance term, that is, Cov^,^) = Var(Xi). The other n~l terms are 
true covariance terms. 

As an added bonus, the values &o, , a n also minimize 




and observe that the solutions still satisfy the normal equations (16.30) and 
(16.32). Thus the credibility premium (16.33) is the best linear estimator of 
each of the hypothetical mean E(X n+ i|0), the Bayesian premium E(X n+ i |X), 
and X n+ i. 

Example 16.24 If E(Xj) = //, Vax(Jfj) = a 2 , and, fori^j, Cov(X h Xj) = 
pa* 2 , where the correlation coefficient p satisfies -l < p < 1, determine the 
credibility premium a 0 + ^j^j- 

The unbiasedness equation (16.30) yields 
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i=i ^ 

The n equations (16.32) become, for i = 1, 

n 

p=y~i ajp + &i 


or, stated another way, 


+ P)^ 


p(i-E- =1 ^-) 


一 p m(i - p) 



iuW^T!** h *)1 r r\* ji •) Hi? Vi 川 “n 9 :: 


- .A — (l-p)M P x j 

ao + ^ajXj = 1 _ i0+ --~ + 2_ rl _ p + np 

j=l J =1 

= (l-Z)fi^ZX, 

where Z = npf (1 — p + Tip) and X = n*" 1 X^=i A. Thus, if 0 < p < 1, then 
0 < Z <1 and the credibility premium is a weighted average of /x = E(Jf n +i) 
and X, that is, is of the form (16.19). 口 


ances ofXj|© and hence the means E( 々 ), variances Var^j), and covariances 
CoviXi.Xj). 
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16.4.4 The Buhlmann model 

This, the first and simplest credibility model, specifies that for each policy- 
holder (conditional on 0) past losses Xi ， … ， X n have the same mean and 
variance and axe independent and identically distributed conditional on 0. 
Thus, define 

M(0) = E(X,|© = 0) 

and 

v(6)=Vai(X j \Q = e). 

As discussed previously, ji{6) is referred to as the hypothetical mean whereas 
v(0) is called the process variance. Define 

M = E[ M (0)] 5 (16.36) 

v = E[t;(0)], (16.37) 

and 

a = Vax[//(0)]. (16.38) 

The quantity \i in (16.36) is the expected value of the hypothetical 
means, v in (16.37) is the expected value of the process variance, and a 
in (16.38) is the variance of the hypothetical means. Note that fi is the 
estimate to use if we have no information about 0 [and thus no information 
about fi(8)]- It will also be referred to as the collective premium. 

The mean, variance, and covariance of the Xjs may now be obtained. First, 

E ⑷ ) =E[E(;|©)j = E[/x ⑼] =f (16.39) 

Second, 

YaxiXj) = E^ax^-I©)] + VarfECXjl©)] 

=E 剛 ] +V 咖 ⑼] 

=v + a. (16.40) 

Finally, for z 一彡， 

Cov ( 不， 曷） =E(XiJ^) - E ( 不 ) E(J^) 

=EpE^I©)]-/, 2 
= EtEC&ie^ 闪 |e)]-{E[ M (0)]} 2 
=E{[m(0)] 2 }-{E[ M (0)]} 2 
=Var[/Lt(0)] 

= a. (16.41) 

This is exactly of the form of Example 16.24 with parameters fi,a 2 = v + a, 
and p — a/{v + a). Thus the credibility premium is 

n 

ap + y^ ajXj = ZX + (1 - Z)n, (16.42) 
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Then (16.44) gives 


GREATEST ACCURACY CREDIBILITY THEORY 559 


k -l- E[Var(X 3 -|0)] ( ) 

K ~ a~Vax[E(X j \Q)Y 1 ’ 

The credibility factor Z in (16.43) with k given by (16.44) is referred to as the 
Biihlmaim credibility factor. Note that (16.42) is of the form (16.19), and 
(16.43) is exactly (16.20). Now, however, we know how to obtain fe, namely 
from (16.44). 

.Formula (16.42) has many appealing features. First, _the credibility pre¬ 
mium. (16.42) is a weighted average of the sample mean X and the collective 
premium fi, a formula which we find desirable. Furthermore, Z approaches 
1 as n increases, giving more credit to X rather than [x as more past data 
accumulates, a feature which agrees with intuition. Also, if the population 
is fairly homogeneous with respect to the risk parameter 0, then (relatively 
speaking) the hypothetical means fi{Q) = E(-Xj*|0) do not vary greatly with 
0 (i.e., they are close in value) and hence have small variability. Thus a is 
small relative to v, that is, & is large and Z is closer to 0. But this agrees with 
intuition because for a homogeneous population the overall mean 弘 is of more 
value in helping to predict next year’s claims for a particular policyholder. 
Conversely, for a heterogeneous population, the hypothetical means E(Xj|0) 
are more variable, that is, a is large and k is small, and so Z is closer to 1. 
Again this makes sense because in a heterogeneous population the experience 
of other policyholders is of less value in predicting the future experience of a 
particular policyholder than is the past experience of that policyholder. 

We now present some examples. 

Example 16.25 (Example 16.20 continued) Determine the Buhlmann esti¬ 
mate of E(X 3 | 0 ,1). 

Prom earlier work, 

M(G)= E(X,|G) = 0.4, fx{B) = E ( 聯 ）= 0.7, 
tt(G)= 0.75, = 0.25, 


fi = ^^(fl)7r(0) = 0.4(0.75) + 0.7(0.25) = 0.475, 

0 

a = M ⑷ 2 兀巧） —M 2 = 0.16(0.75) + 0.49(0.25) - 0.475 2 = 0.016875. 

e 

For the process variance, 

v(G) = Vax(X j \G) = 0 2 (0.7) + 1 2 (0.2) + 2 2 (0.1) - 0.4 2 = 0.44, 
v{B) = VaxiXjlB) = 0 2 (0.5) + 1 2 (0.3) + 2 2 (0.2) - 0.7 2 = 0.61, 
v = = O- 44 ^- 75 ) + 0.61(0.25) - 0.4825. 


and (16.43) gives 


The expected next value is then 0.0654(0.5) + 0.9346(0.475) = 0.4766. This 
is the best linear approximation to the Bayesian premium (given in Example 
16.20). □ 

Example 16.26 Suppose as in Example 16.23 (with rrij = 1) that Xj\Q } j = 
1 ,...are independently and identically Poisson distributed with (given) 
mean 0 and 0 is gamma distributed with parameters a and p. Determine the 
Buhlmann premium. 


fi(d) = E(Xj|0 = 8) = 9, v(6) = Vax(Xj\Q = 0) 


M = E|/i(©)] = E(0) = 


E[v(0)] = E(0) = a/3, 


Vax[/x(0)] = Var(0) = a/3 2 . 


> + k 71 + 1//3 


and the credibility premium is 




But, as shown at the end of Example 16.23, this is also the Bayesian estimate 
E(X n+ i|X). Thus, the credibility premium equals the Bayesian estimate in 
this case. □ 


Example 16.27 Determine the Buhlmann estimate for the setting in Exam¬ 
ple 16.22. 




For this model. 


li = E(0- x ) 


v (㊀ ） =0- 2 , v = E(©~ 2 )= 


(ct-l)(a-2)’ 


a = Var(0 -1 )= 


(a-l)(a-2) 


P 2 

(a-l)2(a-2)' 


which again matches the Bayesian estimate. 

An alternative analysis for this problem could have started with a single 

observation of 5 = X\ -j - h X n . From the assumptions of the problem, S 

has a mean of n©— 1 and a variance of nQ~ 2 . While it is true that S has a 
gamma distribution, that information is not needed because the Bfihlmann 
approximation requires only moments. Following the above calculations, 


n/3 2 

(a-l)(a-2) ! 


(a_l)2( a —2 )， 


The key is to note that in calculating Z the sample size is now 
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What if a benefit change occurred part way through a policy year? For gro 
insurance, what if the size of the group changed over time? 

To handle these variations, we consider the following generalization of the 
BuMmann model. Assume that Xi 、 … 、 X n axe independent, conditional on 
©，with common mean (as before) 

fi(6) = E^l© = 0) 

but with conditional variances 

Var^l© = 0 ) = 心， 

Tflj 


where rrij is a known constant measuring exposure. Note that rrij need only be 
proportional to the size of the risk. This model would be appropriate if each 



of individuals in the group in past year j, or the amount of premium income 
for the policy in past yeax j. 


As in the Btihlmaim model, let 

M = E[m(©)], v = E[u(0)], 

and 

a = Var[/i(0)]. 

Then, for the unconditional moments, from (16.39) E(Xj) = /x, and from 
(16.41) Gov(Xi, Xj) = a, but 

Vax[Xj) = EtVar^l©)] + VartE^I©)] 

= E [ 勞卜剛 

v , 

= - h a. 

rrij 

To obtain the credibility premium (16.33), we will solve the normal equa¬ 
tions (16.30) and (16.32) to obtain ao, ai,..., a n . For notational convenience, 
define 

m = mi + m 2 H - h m n 

to be the total exposure. Then using (16.39) the unbiasedness equation (16.30) 
becomes 

n 

3=1 

- (16.45) 
J=i M 


which implies 
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Had we known that (16.48) would be the correct weighting of the Xj to 
receive the credibility weight Z, the rest would have been easy. For the single 
observation X the process variance is 


Var ( 綱 =E: 


[vW 

1 rrij 


m 


and so the expected process variance is v/m. The variance of the hypothetical 
means is still a and. therefore k = v/(am). There is only one observation of 
X and so the credibility factor is 


Z = 


l + v/(am) m + v/a 


(16.49) 


as before. Equation (16.48) should not have been surprising because the 
weights are simply inversely proportional to the (conditional) variance of each 
Xj. 

Example 16.28 As in Example 16.23, assume that in year j there are Nj 
claims from rrij policies，j = An individual policy has the Poisson 

distribution with parameter Q and the parameter itself has the gamma distri¬ 
bution with parameters a and p. Determine the Buhlmann-Straub estimate 
of the number of claims in year n +1 if there will be m n+ i policies. 

In order to meet the conditions of this model, let Xj = Nj/rrij. Because 
Nj has the Poisson distribution with mean m^Q, E(Xj|0) = 0 = 弘 (㊀ ） and 
Var(_Xj|©) = Q/nij = v(Q)/nij. Then, 

M = E(0) = o ； /3, a = Var(0) = v = E(0) = a/3, 

k = i z 

卢 ’ m +1//3 m/3 + 1’ 

and the estimate for one policyholder is 


Pc = 




m/3 + 1 m/S + l 




where X = mT 
ing the answer 


to Ex? 


i rrijXj. For year n + 1， the estimate is m n+ iP C} match- 
ample 16.23. □ 


The assumptions underlying the Btihlmann-Straub model may be too re¬ 
strictive to reDresent realitv. In a 1Qfi7 naDfir. Hewitt fHHl nh 如 laro-o 


observation 
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Example 16.29 Let the conditional mean be E(Xj|@) = fi(Q) and the-con¬ 
ditional variance be Var(_Xj |©) = w{&) -j- (㊀) /m】*. Further assume that 
Xi，. • • ,X n are conditionally independent given ©. Show that this model sup¬ 
ports Hewitt’s observation and determine the credibility premium. 

Consider independent risks i and j with exposures rrii and rrij and with a 
common value of 0. When aggregated, the variance of the average loss is 


TTliXi + TTljXj 
77li + Tflj 


• e) = 

= ( m \ 

J 

X^rii -rmj J 




= ^ - 1 —— v (㊀ ） 

{rrn + rrij) 2 m+nij 

while a single risk with exposure rrii-i-mj has variance ty(0) +v(Q)/(mi H-mj), 
which is larger. 

With regard to the credibility premium，we have 

E ( 勾） = E[E(X j \Q)] = E[fi(Q)} = ^ 

VaxiXj) = ElVarCX,-)©)] + VartE^-l©)] 

=E [^(0) 4 - ^^1 + Vax[)Lt(0)] 

L 」 

v 

= 忉 H - ha , 

rrij 

and for i ^ j, Cov(Xi 1 Xj) = a as in (16.41). The unbiasedness equation is 
still 


/x = o：o + ^2 


= 1 - 


Equation (16.32) becomes 

a = 自 ¥ + 屯卜 + 忐 ) 

= -令 x 料 云). 


a ,. - . 

1 w-bv/mi' 


Summing both sides yields 
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=y & . = 

^ v 4 - wrru ^ 3 




货 v + wrrij 


v + wrrij 1 + am* * 


The credibility premium 


1 + am* 1 + am* v + wrrij' 

The sum can be made to define a weighted average of the observations by 
letting 

y- n m 3 x 

v- _ ^i=i v + wrrij. j 1 ^ mj v 


the credibility premium is 


ZX + (1- Z)n ， 


Observe what happens as the exposures ra，go to infinity. The credibility 
factor becomes 

z ^an/w <i 
1 + an/w 

Contrast this to the Biihlmann-Straub model where the limit is 1. Thus, 
no matter how large the risk, there is a limit to its credibility. A further 
generalization of this result is provided in Exercise 16.26. □ 


Another generalization is provided by letting the variance of fx(Q) depend 
on the exposure. This maybe reasonable if we believe that the extent to which 




to its size. For example, larger risks may be underwritten more carefully. In 
tlds case, extreme variations from the mean are less likely because we ensure 
that the risk not only meets the underwriting requirements but also appears 
to be exactly what it claims to be. 

Example 16.30 (Example 16.29 continued) In addition to the specification 
presented in Example 16.29, let Var[#(0)】=a + 6/m, where m = m j 
is the total exposure for the group. Develop the credibility formula. 

We now have 

E 闪） = E[E{X j \Q)] = E[ f ,(e)]= fJ . 

Vai{Xj) = E[Var(X,-1©)] + Var[E(^-10)] 

=E L(©) + 避 1 + Var[M(0)] 

L 」 

v b 

=w H - ha-i - 

rrij rrt 

and for i ★ j 

CoviX^Xj) = ElEiXiXjlQ)}-^ 2 
= E^Q) 2 ]-^ 
b 

=aH —— • 
m 

It can be seen that all the calculations used in Example 16.29 apply here 
with a replaced by a + b/m. The credibility factor is 

z _ (a + b/m)m* 

1 + (a + b/m)m* 

and the credibility premium is 

ZX + {l-Z)fi 

with X and m* defined as in Example 16.29. This particular credibility for¬ 
mula has been used in workers compensation experience rating. One example 
of this is presented in detail in [45]. □ 

16.4.6 Exact credibility 

In Examples 16.26—16.28 we found that the credibility premium and the 
Bayesian premium were equal. From (16.34), one may view the credibility 
premium as the best linear approximation to the Bayesian premium in the 
sense of squared error loss. In these examples the approximation is exact 
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to describe the situation when the credibility premium equals the Bayesian 
premium. 

In fact, it is not hard to see that one can ascertain whether credibility 
is exact without even calculating the credibility premium. If the Bayesian 
premium is a linear function of Xi,..., X n , 

v n 

E(X n+ i|X) = a 0 + y^ajXj, 

3=1 

then it is clear that in (16.34) the quantity Q± attains its miniimim value 
of zero with &j = aj for j = 0,1,...,n. Thus the credibility premium is 
ao + &jXj = ao + ajXj = E(X n+ i|X) and credibility is exact. 

This phenomenon occurs fairly generally in connection with linear exponen¬ 
tial family members (Section 12.4.3) and their conjugate priors. We parame¬ 
terize such that Xj\Q — 0 is independently (conditional on 0 = 0) distributed 
with pf for j = 1 , …， n +1, 


fxj\e(xj\0) 


and 0 has pdf 


where —oo < 8q < 6± < oo. It is also assumed that tt (0。) = tt(0i) = 0. For 
the moment, fi and k are simply parameters of ?r(0). We will now demonstrate 
that the choice of symbols was no coincidence. 

In Section 12.4.3 it was shown that 

K ❹ ) = E(Xj.|e = e) = ■- 

We wish to find E[/a(©)]. Prom (16.50 )， 

ln 7 r( 0 ) = —klnq(0) — /jM — lnc(" ， fc) 
and differentiating with respect to 6 gives 

TT'W _ kq ， (e) 

In other words, 

7 T’( 0 ) = fe[/x(0) — #] 7 T( 0 ) (16.51) 

and integrating from 0q to gives 


r( 0 i) — 7r(0o) = k fi(0)Tr(6) d9 — kfj, / ir(0) d0. 
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Differentiation of (16.51) yields 

?r 〃⑹ = kfj!(0)7r(0) -f k 2 [fx(0) - fi] 2 7r(9) 

= -kv(e)7t(e) + & 2 [m(0) - m] 2 ^W- 
Integration with respect to 6 from 6 q to 0± yields 

^(01) -^(0 0 ) = -fc 邵 ; (0)] + fc 2 E{[/i(©) — M] 2 } 

=—kv + k 2 a 

because 〆(©) has mean \i and E{[/x(0) - fj] 2 } = Var[〆©)] = a. E 7r’ (0i) = 
7r , (0 q) = 0, this implies that k = v/a, and so (16.44) is satisfied. 


16.4.7 Linear versus Bayesian versus no credibility 

In Section 16.4.3 it was demonstrated that the credibility premium is the best 
linear estimator in the sense of minimizing the expected squared error with 
respect to the next observation, X n +i. In Exercise 16.59 you are asked to 
demonstrate that the Bayesian premium is the best estimator with no restric¬ 
tions, in the same least squares sense. It was also demonstrated in Section 
16.4.3 that the credibility premium is the linear estimator that is closest to 
the Bayesian estimator, again in the mean squared error sense. Finally, we 
have seen that in a number of cases the credibility and Bayesian pre miums 
are the same. This leaves two questions. Is the additional error caused by 
using the credibility premium in place of the Bayesian premium worth wor¬ 
rying about? Is it worthwhile to go through the bother of using credibility in 
the first place? While the exact answer to these questions depends on the un¬ 
derlying distributions, we can obtain some feel for the answers by considering 


two examples. 

* We begin with, the second question and use ai common situation that has 
already been discussed. What makes credibility work is that we expect to 
perform numerous estimations. As a result, we are willing to be biased in 
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© = ( Qi ， …， ©50)， 


s 1 = Y J {x j -e j )\ 


EVax ( 填 -) 

i=i 


and the mean-squared error is 

E(5i)=E[E(5i|0)]=E 

Using the credibility estimator, the squared error is 
50 

S 2 = ^(0.5Xj+2.5 - 0j) 2 
i=i 

and the mean-squared error is 
E(5 2 ) = E[E(5 2 |0)] 

=E 


為 - 


S(0.25X? + 6.25 + 0| + 2.5^ - 50 广 ^0^|0^) 
i=i • 

[0.25 ( 吾 + 0f) + 6.25 + €>’ + 2.5% — 50 广 ㈣ } 


=^2 [0.25(0.5 + 25.5) + 6.25 + 25.5 + 2.5(5) - 5(5) - 25.5] 
i=i 

= 12.5. 


Of course, we “cheated” a bit. We used squared error as our criterion and 
so knew in advance that the BtlMmann estimate would have the smaller value 
given that it is competing against another linear estimator. The interesting 
part is the significant improvement that resulted. This means that, even if the 
components of the credibility formula Z and fi were not set at their optimal 
values, the credibility formula is still likely to result in an improvement. 

To get a feel for how this improvement comes about, consider a specific set 
of 50 values of The ones presented in Table 16.3 are a random sample from 
the prior gamma distribution sorted in increasing order. The next column 
provides the mean-squared error of the sample mean (6j/10). The final three 

columns provide the bias, variance, and mean-squared error for the credibility- 

estimator based on Z = 0.5 and /ii = 5. The sample mean is always unbiased 
and therefore the variance matches the mean-squared, error and so these two 


Table 16.3 


X 

e MSE 


3.510 

3.637 

3.742 

3.764 

3.793 

4.000 

4.151 

4.153 

4.291 

4.405 

4.410 

4.413 

4.430 

4.438 

4.471 

4.491 

4.495 

4.505 

4.547 

4.606 

4.654 

4.758 

4.763 

4.766 

4.796 


449 

449 

451 

455 

461 

465 


.476 

.477 

.480 


A comparison of the sample mean and the credibility estimator 

0 .这 + 2.5 i" 0:5 尤 + 2.5 

Bias VaZ MSE 0 MSE Bias VaE MSE 


.745 

.088 

.643 

4.875 

.681 

.091 

.555 

4.894 

.629 

•094 

.489 

4.900 

.618 

.094 

•476 

4.943 

•604 

.095 

.459 

4.977 

.500 

.100 

.350 

5.002 

•424 

.104 

.284 

5.013 

.424 

.104 

.283 

5.108 

.354 

.107 

.233 

5.172 

.298 

.110 

.199 

5.198 

.295 

.110 

.197 

5.231 

.293 

.110 

.196 

5.239 

.285 

.111 

.192 

5.263 

.281 

.111 

.190 

5.300 

.264 

.112 

.182 

5.338 

.254 

.112 

.177 

5.400 

.253 

.112 

.176 

5.407 

.247 

.113 

.174 

5.431 

.227 

.114 

.165 

5.459 

.197 

.115 

.154 

5.510 

.173 

.116 

.146 

5.538 

.121 

.119 

•134 

5.646 

.118 

•119 

.133 

5.837 

.117 

.119 

.133 

5.937 

•102 

.120 

.130 

6.263 


Mean 


.488 

.062 

.122 

.126 

.489 

.053 

.122 

•125 

.490 

.050 

.123 

.125 

.494 

.028 

.124 

.124 

.498 

.012 

.124 

•125 

.500 

-.001 

.125 

•125 

.501 

-.006 

.125 

.125 

.511 

-.054 

.128 

.131 

.517 

-.086 

.129 

•137 

.520 

-.099 

.130 

.140 

.523 

一 .116 

.131 

•144 

.524 

-.120 

.131 

•145 

.526 

-.132 

.132 

.149 

•530 

-.150 

.132 

.155 

.534 

-.169 

.133 

.162 

•540 

-.200 

.135 

.175 

•541 

-.203 

.135 

.176 

•543 

-.215 

.136 

.182 

.546 

-.229 

.136 

•189 

.551 

-.255 

.138 

.203 

.554 

一 .269 

.138 

.211 

•565 

-.323 

.141 

.246 

•584 

-.419 

.146 

.321 

.594 

-.468 

.148 

•368 

•626 

-.631 

.157 

.555 

.482 

.091 

.120 

.222 


quantities axe not presented. For the credibility estimator, 

Bias = E(a5J^ + 2.5 — ❷ ）= 2.5 — 0 瑪， 

Variance = Var(0.5 曷 + 2.5) = = 0.025 〜， 

Mean-squared error = bias 2 + variance = 0.250] - 2.4750J + 6.25. 

We see that, as expected, the average mean-squaied error is much lower for 
credibility estimator, and this is achieved by allowing for some bias in the 
individual estimators. Further note that the credibility estimator is at its best 
near the mean of the prior distribution (5). □ 

We have seen that there is real value in using credibility. Our next task 

is to compare the linear credibility estimator to the Bayesian estimator. In 
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st examples, this is difficult because the Bayesian estimates must be ob- 


mean squared errors by simulation. This approach is taken in an illustration 
presented in Foundations of Casualty Actuarial Science [24], p. 467. In the 
following example we use the same illustration but employ an approximation 
that avoids approximate integration. It should also be noted that the linear 
credibility approach requires only assumptions or estimation of the first two 
moments while the Bayesian approach requires the distributions to be com¬ 
pletely specified. This nonparametric feature makes the linear approach more 
robust, which may compensate for any loss of accuracy. 

Example 16.32 Individual observations are samples of size 25 from an in¬ 


prior distribution for © is gamma with mean 50 
the linear credibility and Bayesian estimators. 


5,000. Compare 


For the Btlhlmann linear credibility estimator we have 
M = E ，)]= E (^)=^|， 

a = Vax[M©)]=Vax(|)=^ I 

r m 广 ㊀ 5,000 + 50 2 7,500 

andS ° 〃 25 100 

… 7,500/18 - 103 
+ 5,000/9 

and the credibility estimator is Acred (100X 4 - 50)/103. 

For the Bayesian estimator, the posterior density is 

7Te|x(0|x) CX e-^r^e 10 ^- 0 -^- 0 / 100 
K 0 99_5 e -e 轉 +£HO’ 

which is a gamma density with, parameters 100.5 and (0.01 + ^ 1 ) * 

The posterior mean is 


〜 ^o.oi + E^ 1 + 矿 ’ 

which is clearly a nonlinear estimator. 

With regard to accuracy, we can also consider the sample mean. Given the 
value of 0 , the sample mean is unbiased with variance and mean squared error 
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0 2 /(18 x 25) = 0 2 /45O. For the credibility estimator the bias 


1000 50 e 

"309" + 103 _ 3 


Var e (Acred)= 


(100/103) 2 fl 2 


and the mean-squared error is 


MSE e (A cred ) : 


▲ ( 2 , 500 - 麵 + 


For the Bayes estimate we 


that, given 0, 1/X has a gamma dis- 


distribution with parameters 100 and 1 /8. We note that in the denominator 
of /i Bayes , the term 0.01 will usually be small relative to the sum. An ap¬ 
proximation can be created by ignoring this term, in which case /i Bayes has 
approximately an inverse gamma distribution with parameters 100 and 33.50. 
Then 


BiaS 0 (A B ayes) 

Var 0 (/i Bayes ) 

MSEa(/i Bayes ) 

If we compare the coefficients of 0 2 in the MSE for the three estimators, 
we see that they are 0.00222 for the sample mean ， 0.00219 for the credibil¬ 
ity estimator, and 0.00119 for the Bayesian estimator. Thus for large 6 the 
credibility estimator is not much of an improvement over the sample mean, 
but the Bayesian estimator cuts the mean squared error about in half. Calcu¬ 
lated values of these quantities for various percentiles from the gamma prior 
distribution appear in Table 16.4. □ 


33.56> 9 0.56 / 

99 ""!= 百， 
33.5 2 0 2 
992(98)' 
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Table 16.4 A comparison of the sample mean, credibility, and Bayes estimators 


Percentile 

e 

X 

MSE 

Acred 

Bias 

MSE 

Abj 

Bias 

ayes 

MSE 

1 

0.008 

0.000 

0.485 

0.236 

0.000 

0.000 

5 

0.197 

0.000 

0.484 

0.234 

0.001 

0.000 

10 

0.790 

0.001 

0.478 

0.230 

0.004 

0.001 

25 

5.077 

0.057 

0.436 

0.244 

0.026 

0.031 

50 

22.747 

1.150 

0.265 

1.154 

0.115 

0.618 

.75 

66.165 

9.729 

一 0.157 

9.195 

0.334 

5.227 

90 

135.277 

40.667 

-0.828 

39.018 

0.683 

21.849 

95 

192.072 

81.982 

-1.379 

79.178 

0.970 

44.046 

99 

331.746 

244.568 

-2.735 

238.011 

1.675 

131.397 


use linear credibility to estimate the mean of the distribution of logarithms. 
The result is then exponentiated. Because this procedure is sure to introduce 
bias, 5 a multiplicative adjustment is made. The results are presented in the 
following example with many of the details left for Exercise 16.57. 

Example 16.33 (Example 16.32 continued) Obtain the log-credibility esti¬ 
mator and evaluate its bias and mean-squared error. 

Let Wj = hiXj. Then for the credibility on the logarithms 

/x(0) = E(W\Q) 

=J (lna;)0 4 a;"' 5 e'" e / a; ^ dx 

= [ (In0 - hiy)y s e^ y ^ dy 

Jo 

=In©- 屯 (4), 

where the second integral was obtained using the substitution y = Q/x. The 
last line follows from observing that the term y s e~~ y /6 is a gamma density 
and thus integrates to 1 while the second term is the digamma function (see 
Exercise 16.57) and using tables in [3] we have 审⑷ =1.25612. The next 


5 By Jensen’s inequality, E[InX] < InE(X), and therefore this procedure will underestimate 
the true value. 


required quantity is 


GREATEST ACCURACY CREDIBILITY THEORY 575 





576 CREDIBILITY 


i distribution produce 


ce 0.00318024g f j^l Q0.997705/25p ^ __ 0.997705 


ce 0.00318024 


which produces c = 1.169318 and 


\ ^ J ) 

197705 \] 25 100°" 77O5 r(0.5 + 0.997705) 
*25~ )\ r(0.5) ’ 


Aiog-cred = 1.173043(2.712051) 你 . 


In order to evaluate the bias and mean-squared error for a given value of 
㊀， we must obtain 

E(Alog-credl© = ^) - 1.173043E ^ ln2 - 712051 |© = 

=1.173043E [f] xj ln2 - 712O51)/25 |0 = J 


.173043 U^ ln2 - 712051 )/ 25 r( 4- 


E (Afog-credl© - 0 ) = 1.173043 2 E (e 2Wln 2 712051 10 = 6»J 

=1.173043 2 卜舰測零寸 ( 4 一 21n2.n2051^j 2a 

The measures of quality axe then 

Bias S(Alog-cred) = E (Alog-credl 0 = 0 ) ~ |6>, 

MSE e (A log . cred ) = E(Af og . cred |0 = e ) - [E(A Zog . cred |0 = e)f 
+ [bias e (Ai og - cr ed)] 2 - 

Values of these quantities axe calculated for various values of 6 in Table 16.5. 
A comparison with Table 16.4 indicates that the log-credibility estimator is 
almost as good as the Bayes estimator. □ 


In practice, log-credibility is as easy to use as ordinary credibility. In either 
case, one of the computational methods of the next section would be used. 
For log-credibility, the logarithms of the observations are substituted for the 
observed values and then the final estimate is exponentiated. The bias is 
corrected by multiplying all the estimates by a constant such that the sample 
mean of the estimates matches the sample mean of the original data. 
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Table 16.5 Bias and mean squared error for the log-credibility estimator 


Percentile 

e 

Bias 

MSE 

1 

0.008 

0.000 

0.000 

5 

0.197 

0.001 

0.000 

10 

0.790 

0.003 

0.001 

25 

5.077 

0.012 

0.034 

50 

22.747 

0.026 

0.666 

75 

66.165 

0.023 

5.604 

90 

135.277 

一 0.028 

23.346 

95 

192.072 

一 0.091 

46.995 

99 

331.746 

一 0.295 

139.908 


16.4.8 Notes and References 

In this section, one of the two major criticisms of limited fluctuation credibility 
has been addressed. Through the use of the variance of the hypothetical 
means, we now have a means of relating the mean of the group of interest, 
/x(0), to the manual, or collective, premium, ji. The development was also 
mathematically sound in that the results followed directly from a specific 
model and objective. We have also seen that the additional restriction of a 
linear solution was not as bad as it might be in that often we still obtain 
the exact Bayesian solution. There has subsequently been a great deal of 
effort expended to generalize the model. With a sound basis for obtaining a 
credibility premium, we have but one remaining obstacle: how to numerically 
estimate the quantities a and v in the Buhlmann formulation, or how to 
specify the prior distribution in the Bayesian formulation. Those matters are 
addressed in the final section of this chapter. 

A historical review of credibility theory including a description of the lim¬ 
ited fluctuation and greatest accuracy approaches is provided by Norberg 
[100] • Since the classic paper of Btihlmann [18], there has developed a vast 
literature oh credibility theory in the actuarial literature. Other elementary 
introductions are given by Herzog [52] and Waters [135]. Other more advanced 
treatments are Goovaerts and Hoogstad [46] and Sundt [127]. An important 
generalization of the Btihlmann — Straub model is the Hachemeister [48] re¬ 
gression model, which was not discussed here. See also Klugman [76]. The 
material on exact credibility is taken from Jewell [66]. See also Ericson [34]. 
A special issue of Insurance: Abstracts and Reviews (Sundt [126]) contains an 
extensive list of papers on credibility. 




(a) Determine 7r(0) for each of the six die-spimier combinations. 

(b) Determine the conditional distributions fx\e( x \^) f or the claim 
sizes for each die-spinner combination. 

(c) Determine the hypothetical means fi[0) and the process variances 
v(6) for each 0. 

(d) Determine the marginal probability that the claim X\ on the first 
iteration equals 3. 

(e) Determine the posterior distribution (0|3) of 0 using Bayes* 
theorem. 

(f) Use (16.25) to determine the conditional distribution /x 2 |Xi (^ 2 [3) 
of the claims X 2 on the second iteration given that Xi = 3 was 
observed on the first iteration. 

(g) Use (16.28) to determine the Bayesian premium E(X 2 |Xi = 3). 

(h) Determine the joint probability that X 2 = X 2 and X\ = 3 for 
X 2 = 0, 3, 8. 

(i) Determine the conditional distribution directly using 

(16.23) and compare your answer to that of (f). 

(j) Determine the Bayesian premium directly using (16.27) and com ， 
pare your answer to that of (g). 

(k) Determine the structural parameters and a. 

(l) Compute the Bfihlmann credibility factor and the Bfihlmaim cred- 
ibility premium to approximate the Bayesian premium E(X 2 \Xi == 
3). 


16.23 Three urns have balls marked 0, 1, and 2 in the proportions given in 
Table 16.6. An urn is selected at random, and two balls are drawn from that 
urn with replacement. A total of 2 on the two balls is observed. Two more 
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Table 16.6 Data for Exercise 16.23 


Urn 

0s 

Is 

2s 

1 

0.40 

0.35 

0.25 

2 

0.25 

0.10 

0.65 

3 

0.50 

0.15 

0.35 


(b) Determine the conditional distributions fx\@( x \^) for the totals on 
the two balls for eacli urn. 

(c) Determine the hypothetical means fi(6) and the process variances 
v(6) for each 0. 

(d) Determine the marginal probability that the total X x on the first 
two balls equals 2. 

(e) Determine the posterior distribution tcq\ X i (^) using Bayes 5 the¬ 
orem. 

(f) Use (16.25) to determine the conditional distribution fxo\x x (^ 2 ) 2 ) 
of the total X 2 on the next two balls drawn given that Xi = 2 was 
observed on the first two draws. 

(g) Use (16.28) to determine the Bayesian premium E(X 2 \Xi = 2). 

(h) Determine the joint probability that the total X 2 on the next two 
balls equals x 2 and the total X± on the first two balls equals 2 for 
X2 = 0 , 1 , 2 , 3 , 4 . 

(i) Determine the conditional distribution fx 2 \x 1 (^ 2 \^) directly using 
(16.23) and compare your answer to that of (f). 

(j) Determine the Bayesian premium directly using (16.27) and com¬ 
pare your answer to that of (g). 

(k) Determine the structural parameters and a. 

(l) Determine the BtiMmann credibility factor and the Btihlmann cred¬ 
ibility premium. 

(m) Show that the Buhlmann credibility factor is the same if each “ex- 
posure unit” consists of one draw from the urn rather than two 


16.24 Suppose that there are two types of policyholder: type A and type B. 
Two-thirds of the total number of the policyholders are of type A and one- 
third are of type B. For each type, the information on annual claim numbers 
and severity are given as follows: 

A policyholder has a total claim amount of 500 in the last four years. 
Determine the credibility factor Z and the credibility premium for next year 
for this policyholder. 
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16.25 Let ©i represent the risk factor for claim numbers and let 02 represent 
the risk factor for the claim severity for a line of insurance. Suppose that ©i 
and ©2 are independent. Suppose also that given 0i = 9± the claim number 
N is Poisson distributed and given 02 = O 2 the severity Y is exponentially 
distributed. The expectations of the hypothetical means and process variances 
for the claim nximber and severity as well as the variance of the hypothetical 
means for frequency axe respectively 

fj, N = 0.1, vn = 0.1, aN = 0.05, 

= 100, v v = 25,000. 

Three observations are made on a particular policyholder and we observe total 
claims of 200. Determine the Btihlmann credibility factor and the B fihlmann 
premium for this policyholder. 

- 16.26 Suppose that X 1 ,...,X n axe independent (conditional on 0) and that 
E(Xj|0) = j3j/Lt(0) and Var(Xj|0) = Tj(0) + ^jv(O), j = 

Let 

fi = E[/i(©)], v = E[u(0)], Tj = E[tj(0)], a = Vax[/z(0)]. 

(a) Show that 

E(Xj-) = ^ 5 n, Var(Xj) = tj + t/jjV + /3|a, 

and 

CoY(X i ,X j ) = P i /3 j a, 

(b) Solve the normal equations for a。, ai, … ， a n to show that the cred¬ 
ibility premium satisfies 

ao + E = ^~Z) E(X n+1 ) + Z0 n+1 X, 
j=l 

where 

rrij = + j = 1 ,... ,n, 

m = mi -I - hm nj 

Z = am(l + am)"" 1 , 
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16.27 For the situation described in Exercise 12.72 determine "(0) and the 
Bayesian premium E(X n +i|x). Why is the Bayesian premium equal to the 
credibility premium? 

16.28 For the situation described in Exercise 12.73 determine and the 
Bayesian premium E(X n+ i|x) and verify directly that the credibility premium 
equals the Bayesian premium. 

16.29 For the situation described in Exercise 12.74 determine fx(0) and the 
Bayesian premium E(X n+ i |x) and verify directly that the credibility premium 
equals the Bayesian premium. 

16.30 Consider the generalization of the linear exponential family given by 


If 77i is a parameter, this is called the exponential dispersion family. In Ex¬ 
ercise 12.79 it was shown that the mean of this random variable is —q^O) /q(0). 
For this exercise, assume that m is known. 

(a) Consider the prior distribution 


[g(g)]~ fc exp(-6>/xfc) 


<6 < 6i with tt(0o) 


Determine the Bayesian premium. 

(b) Using the same prior, determine the Bfihlmarm premium. 

(c) Show that the inverse Gaussian distribution is a member of the 
exponential dispersion family. 

16.31 Suppose that Xi，..,X n axe independent (conditional on 0) and 

E^l©) = r^(0) and Vax(X,|0) = t j = i，...， n . 

rrij 

Let fi = E[/i(©)], v = E[v (㊀) ]， a = Var[/x(0)] ? k = v/a, andm = mi+. • -+m n . 

(a) Discuss when these assumptions may be appropriate. 

(b) Show that 

E(Xj) = Vax(Xj) = r 2i (a + v/rrij), 


GoviXuXj) 
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(c) Solve the normal equations for 5 j 。， 丘 1 ，…， to show that the cred¬ 
ibility premium satisfies 


a 0 + E^ = Fi^ rn+V+ fef^^ 例 Xj . 

j=l J=1 

(d) Give a verbal interpretation of the formula in (c). 

(e) Suppose that 




-p{x h m h r)e^ r ~ 3x ^ 


Show that E(Xj|0) = T j fi(Q) and that Var(Xj|0) = r 2j v{Q)lmj y 
where ii{0) = ~^lng(0) and v(0)= 一 〆 (0). 

(£) Prove that credibility is exact if 0 has pdf 


which satisfies 7r(0o) = tt(0i) = 0. 

16-32 Suppose that given 0 = 0 the random variables X u ♦• - ,X n axe inde¬ 
pendent with Poisson pf 


fxjieixjlO )= 


(a) Let S = Xi 


- X n . Show that S has pf 


-?r(0) d9, s = 0,1,2, 


where 0 has pdf 

(b) Show that the Bayesian premium is 

E(X n +i |Xi + • * * + X n = 5 )= 


s-hl fsjs + i) 


-1 - r ^v n — oy — ^ ， 

where s = J2j=i x 3 - 

(c) Evaluate the distribution of 5 in (a) when tt( 0) is a gamma distri¬ 
bution. What type of distribution is this? 

16.33 Suppose Xj\Q is normally distributed with mean 0 and variance 幻 for 
= 1,2, • •., n + 1. Further suppose 0 is normally distributed with mean /x 
and variance a. Thus, 


fxM x 3\ e ) = (2^)~ 1/2 exp - 0) 2 ， -00 < Xj < < 


and 

7r(0) = (27m) 一工 / 2 exp — M) 2 j ， 一 00 < 0 < oo. 

Determine the posterior distribution of 0|X and the predictive distribution 
of X n+ i\X.. Then determine the Bayesian estimate of E(X n+ i|X). Finally, 
show that the Bayesian and BiiMmaim estimates are equal. 

16.34 (*) Your friend selected at random one of two urns and then she pulled 
a ball with number 4 on it from the um. Then she replaced the ball in the urn. 
One of the urns contains four balls, numbered 1-4. The other urn contains six 
balls, numbered 1-6. Your friend will make another random selection from 
the same urn. 

(a) Estimate the expected value of the number on the next ball using 
the Bayesian method. 

(b) Estimate the expected number on the next ball using Biihlmaim 
credibility. 

16.35 The number of claims for a randomly selected insured has the Poisson 
distribution with parameter 0. The parameter 0 is distributed across the 
population with pdf 7r(0) = 30 一 4 ，6 > 1. For an individual, the parameter 
does not change over time. A particular insured experienced a total of 20 
claims in the previous two years. 

(a) (*) Determine tlie Btihlmann credibility estimate for the future 
expected claim frequency for this particular insured. 

(b) Determine the Bayesian credibility estimate for the future expected 
claim frequency for this particular insured. 

16.36 (*) The distribution of payments to an insured is constant over time. 
K the BiiMmanii credibility assigned for one-half year of observation is 0.5, 
determine the Biihlmann credibility to be assigned for three years. 

16.37 (*) Three urns contain balls marked either 0 or 1. In urn A, 10% are 
marked 0; in urn B, 60% axe marked 0; and in urn C, 80% are marked 0. 
An urn is selected at random and three balls selected with replacement. The 
total of the values is 1. Three more balls are selected with replacement from 
the same urn. 

(a) Determine the expected total of the three balls using Bayes ， theo¬ 
rem. 

(b) Determine the expected total of the three balls using Btihlmann 
credibility. 


16.38 (*) The number of claims follows the Poisson distribution with para¬ 
meter A. A particular insured had three claims in the past three years. 
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(a) The value of A has pdf /(A) = 4A— 5 ，A > 1. Determine the value ； 
of K used in BtiMmami’s credibility formula. Then use Bfihlmam 
credibility to estimate the claim frequency for this insured. 

(b) The value of A has pdf /(A) = 1 ， 0 〈入 < 1. Determine the va_ 
of K used in Btihlmann^ credibility formula. Then use Btlhlmann 
credibility to estimate the claim frequency for this insured. 

16.39 (*) The number of claims follows the Poisson distribution with para- 
meter h. The value of h has the gamma distribution with pdf f(h) = he~ h , 
/i 〉 a Determine the Btihlmann credibility to be assigned to a single obser¬ 
vation. (The Bayes solution was obtained in Exercise 12.86.) 


Ltion of Exercise 12.88. 


GREATEST ACCURACY CREDIBILITY THEORY 


the number of claims per risk per year has mean | and variance 嘉 while the 
amount of a single claim has mean 2 and variance 5. A risk is selected at 
random from one of the two classes and is observed for four years. 

(a) Determine the value of Z for Btihlmann credibility for the observed 
pure premium. 

(b) Suppose the pure premium calculated from the four observations 
is 0.25. Determine the Btihlmann credibility estimate for the risk’s 
pure premium. 


16.44 (*) Let Xi be the outcome of a single trial and let.E(X 2 |Xi) be the 
expected value of the outcome of a second trial. You axe given the following 
information: 




Buhlmann estimate of 
E(X 2 \X 1 = T) 

Bayesian estimate of 
E(X 2 \X 1 =T) 



2.72 

2.6 



7.71 

7.8 

mM 

IHH9 

10.57 

— 


Determine the Bayesian estimate for E(X 2 \X\ = 12). 

16.45 Consider the situation of Exercise 12.90. 

(a) Determine the expected number of claims in the second year using 
Bayesian credibility. 

(b) (*) Determine the expected number of claims in the second year 
using Biihlmann credibility. 

16.46 Consider the situation of Exercise 12.91. 

(a) Determine the expected number of claims in the second year using 
Bayesian credibility. 

(b) Determine the expected number of claims in the second year using 
Btihlmann credibility. 

16.47 Two spinners, A\ and A 2 , are used to determine the number of claims. 
For spinner A\ there is a 0.15 probability of one claim and 0.85 of no claim. 

For spinner A 2 there is a 0.05 probability of one claim and 0.95 of no claim. 

If there is a claim, one of two spinners, B\ and B 2 , is used to determine the 

amount. Spinner B\ produces a claim of 20 with probability 0.8 and 40 with 
probability 0.2. Spinner B 2 produces a claim of 20 with probability 0.3 and 40 
with probability 0.7. A spinner is selected at random from each of Ai, A 2 and 
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from J5i ， S 2 . Three observations from the selected pair yields claims amounts 
of 0, 20, and 0. 

(a) (*) Use Btihlmaim credibility to separately estimate the expected 
number of claims and the expected severity. Use these estimates to 
estimate the expected value of the next observation from the same 
pair of spinners. 

(b) Use Buhlmann credibility once on the three observations to esti¬ 
mate the expected value of the next observation, from the same pair 
of spinners. 

(c) (*) Repeat parts (a) and (b) using Bayesian estimation. 

(d) (*) For the same selected pair of spinners, determine 

= X 2 = • • • = X n ^i = 0). 

16.48 (*) A portfolio of risks is such that all risks are normally distributed. 
Those of type A have a mean of 0.1 and a standard deviation of 0.03. Those 









ution is a member of the linear expo- 

k(6) is a conjugate prior for fx\e{ x \^)- 
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16.51 The amount of an individual claim has the exponential distribution 
with pdf /y| A (y|A)= 入一 1 e 一 3 ^ 入， 2 /, A > 0. The parameter 入 has the inverse 
gamma distribution with pdf 7r(A) = 400A"" 3 e^ 2O / A . 

(a) (*) Determine the unconditional expected value, E(X). 

(b) Suppose two claims were observed with values 15 and 25. Deter¬ 
mine the Biihlmanxi credibility estimate of the expected value of 
the next claim from the same insured. 

(c) Repeat part (b), but determine the Bayesian credibility estimate. 

16.52 The distribution of the number of claims is binomial with n = 1 and 6 
unknown. The parameter 6 is distributed with mean 0.25 and variance 0.07. 
Determine the value of Z for a single observation using Btihlmann’s credibility 
formula. 

16.53 (*) Consider four marksmen. Each is firing at a target that is 100 feet 
away. The four targets are 2 feet apart (that is, they lie on a straight line at 
positions 0, 2, 4, and 6 in feet). The marksmen miss to the left or right, never 




i 'If 二 V ,、. ：二 、: 


randomly selected marksman. 

(b) Which of the following will increase BtiMmann credibility the most? 
i. Revise the targets to 0, 4, 8, and 12. 

• ii. Move the marksmen to 60 feet from the targets. 

iii. Revise targets to 2, 2, 10, 10. 

iv. Increase the number of observations from the same marksman 
to three. 

v. Move two of the marksmen to 50 feet from the targets and 
increase the number of observations from the same marksman 
to two. 

16.54 (*) Risk 1 produces claims of amounts 100 ， 1,000, and 20,000 with 
probabilities 0.5, 0.3, and 0.2, respectively. For risk 2 the probabilities are 
0.7, 0.2, and 0.1. Risk 1 is twice as likely as risk 2 of being observed. A claim 
of 100 is observed, but the observed risk is unknown. 

(a) Determine the Bayesian credibility estimate of the expected value 
of the second claim amount from the same risk. 

(b) Determine the Buhlmann credibility estimate of the expected value 
of the second claim amount from the same risk. 








588 CREDIBILITY 


16.55 (*) You axe given the following: • 

1. The number of claims for a single insured follows a Poisson distribution 
with mean M. 

2. The amount of a single claim has an exponential distribution with pdf 
fx l ^\X) = X- 1 e-/\ x,X>0. 

3. M and A axe independent. 

4. E(M) = 0.10 and Vax(M) = 0.0025. 

5. E(A) = 1,000 and Vax(A) = 640,000. 

6. The number of claims and the claim amounts are independent. 

(a) Determine the expected value of the pure prem i u m ’s process vari¬ 
ance for a single risk. 

(b) Determine the variance of the hypothetical means for the pure pre¬ 
mium. 

16.56 In Example 16.24, if p = 0, then Z = 0, and the estimator is fi. That 
is, the data should be ignored. However, as p increases toward 1, Z increases 
to 1, and the sample mean becomes the preferred predictor of X n +i. Explain 
why this is a reasonable result. 





)ution with 〆 unknown and a = 2. The prior distribution for © (using 0 
resent the unknown value of fj) is normal with mean 5 and standard de- 
a 1. Determine the Bayes, credibility, and log-credibility estimators and 
ire their mean-squared errors, evaluating them at the same percentiles 
d in Examples 16.32 and 16.33. 


EMPIRICAL 


16.59 In the following, let the random vector X represent all the past data 
and let X n +i represent the next observation. Let 殳 (X) be any function of the 
past data. 

(a) Prove that the following is true. 

E{[X n+1 -g(X)] 2 } = E{[X„ +1 -E(Z n+1 |X)] 2 } 
+E{[E(X n+1 |X)- 5 (X)] 2 }, 

where the expectation is taken over (X„+i ， X). 

(b) Show that setting g(X) equal to the Bayesian premium (the mean 
of the predictive distribution) minimizes the expected squared er- 
ror, E{[X n+x - 5 (X)] 2 }. 

(c) Show that, if p(X) is restricted to be a linear function of the past 
data, then the expected squared error is minimized by the credi¬ 
bility premium- 


16.5 EMPIRICAL BAYES PARAMETER ESTIMATION 


In the previous section a modeling methodology was proposed which suggested 
the use of either the Bayesian or credibility premium as a way to incorporate 


Interest because the input < 
be known. These examples 


las not yet been addressed. 
)btain numerical values for 
tions fxj\ei x j\^) and w(0) 
useful for illustration of the 
v reDresent the business of 


More practical models of necessity involve the use oi parameters 
t be chosen to ensure a close agreement between the model and 
amples of this include: the Poisson-gamma model (Example 16.15 )， 


the gamma parameters a ana p neea to oe seiectea or tne nunim 
Buhlmaim-Straub parameters //, v, and a. Assignment of numerical va 
the Bayesian or credibility premium requires that these parameters 
>laced by numerical values. 

In general, the unknown Daxameters axe those associated with the strucl 


follows the Bayesian framework of the previous section, 
a the Bayesian context all structural parameters are as- 
here is no need for estimation. An example of this is the 
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to use the data at hand to estimate the structural (prior) parameters. This 
approach is called empirical Bayes estimation. 

We refer to the situation where tt(6) and fxj\e{ x j\^) Qie left largely -un¬ 
specified (for example, in the Bfihlmann or Btihlmann-Straub models where 
only the first two moments need be known) as the nonparametric case. This 
situation is dealt with in Section 16.5.1. K assumed to be of 

parametric form (e.g., Poisson, normal, etc.) but not tt( 0), then we refer to 
the problem as being of a semiparametric nature, and this is considered in 
Section 16.5.2. Finally, the (technically more difficult) folly parametric case 
where both fxj\@(^j\^) and ?r(0) are assumed to be of parametric form is 
briefly discussed in Section 16.5.3. 

This decision as to whether to select a parametric model or not depends par¬ 
tially on the situation at hand and partially on the judgment and knowledge of 
the person doing the analysis. For example, an analysis based on claim counts 
might involve the assumption that is of Poisson form, whereas the 

choice of a parametric model for tt(0) may not be reasonable. 

Any parametric assumptions should be reflected (as far as possible) in para¬ 
metric estimation. For example, in the Poisson case, because the mean and 
variance are equal, the same estimate would normally be used for both. Non- 
parametric estimators would normally be no more efficient than estimators 
appropriate for the parametric model selected, aBSiiming that the model se¬ 
lected is appropriate. This notion is relevant for the decision as to whether 
to select a parametric model. 

Finally, nonparametric models have the advantage of being appropriate for 
a wide variety of situations, a fact which may well eliminate the extra burden 
of a parametric assumption (often a stronger assumption than is reasonable). 

In this section the data are assumed to be of the following form. For 
each of r > 1 policyholders we have the observed losses per unit of exposure 
Xi = [Xu, . .., X ini ) T for i = 1, …， r. The random vectors {X i3 i = 1, ..., r} 
axe assumed to be statistically independent (experience of different policyhold¬ 
ers is assumed to be independent). The (unknown) risk parameter for the ith. 
policyholder is 0“ i = and it is assumed further that 6i,...,9 r are re¬ 

alizations of the independent and identically distributed random variables ©i 
with structural density 7r (0i). For fixed i, the (conditional) random variables 
Xij\Qi axe assumed to be independent with pf j = 1， • • ■ ， n i- 

Two particularly common cases produce this data format. The first is 
classification, rate making or experience rating. In either, i indexes the classes 
or groups and j indexes the individual members. The second case is like the 

first where i continues to index the class or group, but now j is the year and 

the observation is the average loss for that year. An example of the second 

setting is Meyers [91]，where i = 1,...,319 employment classifications are 
studied over j = 1,2, 3 years. Regardless of the potential settings, we will 
refer to the r entities as policyholders. 

There may also be a known exposure vector mi = - * - ^rrii ni ) T 

for policyholder z, where i = 1 ， … ， r. If not (and if it is appropriate) one may 


set rriij = 1 in what follows for all % and j. For notational convenience let 


rrii = ^2 S = 1 ，…，『， 

i=i 


be the total past exposure for policyholder i, and let 
1 n *' 

Xi = i = 1， •. • ， r ， 

be the past average loss experience. Furthermore, the total exposure is 

i=l i=l j=l 

and the overall average losses are 

X = — mjXj = ^ (16.54) 

m i=l i=l i=i 

The parameters which need to be estimated depend on what is assumed 
about tlie distributions fxij\e( x ij\^i) 丌 ( 沒 ) • 

For the BtiMmaan-Straub formulation there are additional quantities of 
interest. The hypothetical mean (assumed not to depend on j) is 

E(X ij \Q i = e i ) = fi(0 i ) 


and the process 


Vax(Xij\Qi = di) ■ 


H ❹ i) 


The structural parameters axe 

M = E[/x(0i)], v = E[u(0i)], 

and 

CL "V"8tr|jL4( ㊀ j)]• 

The approach to be followed in this section is to estimate /x, v, and a (when 
unknown) from the data. The credibility premium for next year’s losses (per 
exposure unit) for policyholder i is 

ZiXi + (1 — Z{) fi, i = 1,... ,r, (16.55) 


where 


k = 












estimators of fx^v, and a are denoted by jl^v, and a, respectively, then one 
’uld replace the credibility premium (16.55) by its estimator 

ZiXi + (1 — 念 i)jl ， (16.56) 



瞧 ■• 疆 "U1LJ •■亂 


Example 16.34 Suppose that rii=n>l for all i and = 1 for all i and 
j- That is，for policyholder i } we have the loss vector 

又 = * • • ， -Xin) j 么 = 1 ， . • • ， 7\ 

Furthermore, conditional on ©i = 0i } Xij has mean 

KD = EiXijlQi = 6i) 

and variance 

v(e i ) = V3x(x ij \Q i = e i ), 

an d Xn ^... ,Xi n are independent (conditionally)• Also, different policyhold¬ 
ers’ past data are independent，so that if i _ s，then and X st are inde¬ 
pendent. In this case 

Xi = rT l ^2Xij and 文 =r- 1 ^Xj = (to) - 1 ^ ^ Xij. 

i=l i=l i=l 3=1 

Determine unbiased estimators of the Buhlmann quantities. 

An unbiased estimator of fj, is 

fL = X 

because 

E(A) = (rn)- 1 J2 E E (^i) = E E 

i=l j=l i=\ j=i 

=(™) -1 X ) 52 E [ M ( 0 i )] = ( rn )- 1 

i=l i=l 2=1 j=l 


To estimate v, consider 
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i=i 

Recall that for fixed i the random variables X iu ...,Xi n axe independent, 
conditional on Qi = 0i. Thus, ^ is an unbiased estimate of Var(J^|©《= 
Oi) =v(di). Unconditionally, 

E(i)i) = E[E(vi|0i)] = E[v(0i)] = v 

and Vi is unbiased for v. Hence an unbiased estimator of v is 

i= lj2vi = r ,^_ . - ^) 2 - ( 16 . 57 ) 

i=l V J i=l i=l 

We now turn to estimation of the parameter a. Begin with 

n n 

E(Xi|0i = 0 { ) = n~ ，i y^E(Xf J |0f = Oi) = rT 1 ^2 m(^z) = M ⑹ • 
i=i j=i 

Thus, 

E(Xi) = E[E ( 兄 咏 )] =E[Mei)]=" 

and 

Vax(Xi) - Var[E (足 |ei)]+E[Var ( 足 | 氐 )] 

=Vax[M(©i)]+Ep^ 

Therefore, 文 i, … ， X r are independent with comm 
variance a + v/n. Their sample average is X = r— 1 Consequently, 

an unbiased estimator of a + v/n is (r - 1) _1 — 尤 ) 2 . Because we 

already have an unbiased estimator of v given above, an unbiased estimator 
of a is given by 

= 占 E d 尤 ) 2 - ^TZ'1) E (16.58) 

1=1 K J i=l j=l 

□ 

These estimators might look familiar. Consider a one-factor analysis of 
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for v (16.57) is the within (also called the error) mean square. The first term 
in the estimator for a (16.58) is the between (also called the treatment) mean 
square divided by n. The hypotliesis that all treatments have the same mean 
is accepted when the between mean square is small relative to the within mean 
square~that is, when a is small relative to v. But that implies Z will be near 
zero and little credibility will be given to each. Xi ， This is as it should be 
when the policyholders are essentially identical. 

Due to the subtraction in (16.58), it is possible that a could be negative. 
When that happens, it is customary to set a = Z = 0. This case is equivalent 
to the F test statistic in the analysis of variance being less than 1, a case that 
always leads to an acceptance of the hypothesis of equal means. 

Example 16.35 (Example 16.34 continued) As a numerical illustration ， sup¬ 
pose we have r = 2 policyholders with n = 3 years experience for each. Let 
the losses be Xx = (3,5, 1) T and X 2 = (6 ， 12,9) r . Estimate the Buhlmann 
credibility premiums for each policyholder. 

We have 

X 1 = I(3 + 5 + 7)=5 ) X 2 = f(6 + 12 + 9) = 9 
and so X = |(5 4* 9) = 7. Then /i = 7. We next have 

v± = ^[(3 — 5) 2 + (5 — 5) 2 + (7 — 5) 2 ] = 4 ， 

V2 = |[(6 - 9) 2 + (12 - 9) 2 + (9 - 9) 2 ] = 9, 
and so t) = |(4 + 9)= 譬 . Then 

a = [(5-7) 2 + (9-7) 2 ]-|i) = f. 

Next, k = v/a — || and the estimated credibility factor is Z = 3/(3+A) = f. 
The estimated credibility premiums are 

zx 1 + {i-zyn = ( 籠 )⑻ + ( 勸⑺ = 甓， 

M 2 + (i - 旬 a = (i) ⑼十 ㈤ ⑺二罾 
for policyholders 1 and 2 respectively. □ 

We now turn to the more general Btihlmann—Straub setup described earlier 
in this section. We have E(Xij) = E[E(X i：? | ㊀ i)] = E[^(0i)] = /x. Thus, 

£(^100 = £ ^-E(X ij \Q i ) = f^ ^/i(©i) = MQi), 

U mi U- m 

implying that 

E ( 為） =EpCXilQi)] = E[ f ,(Q i )]= f i. 


Finally, 
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and so an obvious unbiased estimator of /z is 


Now, == and Varp^lQi) = for j = 1，...，77 

Consider 

- _ X)j=l ^ij(Xij — Xi) 2 ^ ^ 

= --- ， z = 1,..., r. (16.6( 

rii — 1 、 

Condition on ㊀ i and use (16.12) with /3 = 0 and a = v (©《)• Then E ( 私 |© 名） 
v(©i). But this means that, unconditionally, 

E(t)i) = E[E (灸 |©i)] = E[v(Qi)] = v 


and so Vi is unbiased for v for i = 1. 


Another unbiased estimator for 


v is then the weighted average v = J2i=i w i^u where Ya=i w i = 1- If we 
choose weights proportional to ni - 1, we weight the original XijS by 
That is, with wi = (rii — 1) / J2i=i ( n i ~ 1)j we obtain an unbiased estimator 
of v, namely, 

._ Y^Lx m ij(Xij -叉 i) 2 

v= ELiK-i)— —• () 

We now turn to estimation of a. Recall that for fixed i the random variables 
Xu ,..., Xi n . axe independent, conditional on 0^. Thus, 

vax^ie.) = 瓣) 2 雖撕 2 寶 


But this means that, unconditionally, 

Vax(Xi) = VailEiXilQi)] + ElVaiiXilQi)} 

=Vax[//r(©i)] + E = a + ■j—*. (16.62) 

To summarize, … ， X r axe independent with common mean /x and vari¬ 
ances Var(Xi) = a+v/rrii. Furthermore, X = mT 1 Si=i rriiXi. Now, (16.12) 
may again be used with /3 = a and a = v to yield 




Zmf + + 一 1). 
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An unbiased estimator for a may be obtained by replacing v by an unbiased 
estimator v and “solving” for a. That is, an unbiased estimator of a is 




(16.63) 


with v given by (16.61). An alternative form of (16.63) is given in Exercise 
16.67. 

Some remarks are in order at this point. Equations (16.59), (16.61)，and 
(16.63) provide unbiased estimators for and a, respectively. They are 
jionpaxametric, requiring no distributional assumptions. They are certainly 
not the only (unbiased) estimators which could be used, and it is possible 
that a < 0. In this case, a is likely to be close to 0， and it makes sense 
to set Z = 0. Furthermore, the ordinary Buhlmaim estimators of Example 
16.34 are recovered with rriij = 1 and rii = n. Purthermore, as may be 
seen from Example 16.41, these estimators are essentially maximum likelihood 
estimators in the case where Xij\Q{ and 0i are both normally distributed, and 
thus the estimators have good statistical properties. 

There is one problem using the formulas developed above. In the past, the 
data from the ith policyholder was collected on an exposure of m-f. Total losses 
on all policyholders was TL = Yh=i If we had charged the credibility 

premium as given above, the total premium would have been 

TP = ^m^ZiXi + il-Zi)^} 

i=l 

r r 

=J2 mi(l- Zi){fL-X^ + Y, 

i=l i=l 

r ^ r 

=rrti - t{li — Xi) + / ^ TriiXi. 

It is often desirable for TL to equal TP. The reason is that any premium 
increases that will meet the approval of regulators will be based on the total 
claim level from past experience. While credibility adjustments make both 
practical and theoretical sense, it is usually a good idea to keep the total 
unchanged. For this to happen, we need 


0 = 1>纟二 一名) 


i=l z=l 


(16.64) 


Table 16.7 Data for Example 16.36 



Policyholder 

Year 1 

Year 2 

Year 3 

Year 4 

Total claims 
No. in group 

1 

- 

10,000 

50 

13,000 

60 

75 

Total claims 
No. in group 

2 

18,000 

100 

21,000 

110 

17,000 

105 

90 


That is, rather than using (16.59) to compute /x, use a credibility-weighted 
average of the individual sample means. Either method provides an unbiased 
estimator (given the 爲 s), but this latter one has the advantage of preserving 
total claims. It should be noted that when using (16.63), the value of X from 
(16.54) should still be used. It can also be derived by least squares arguments. 
Finally, from Example 16.7 and noting the form of Var(Xj) in (16.62), the 
weights in (16.64) provide the smallest unconditional variance for jl. 

Example 16.36 Past data on two group policyholders are available and are 
given in Table 16.7. Determine the estimated credibility premium to be charged 
to each group in year 4. 


We first need to determine the average claims per person for each group in 
eac ^ Past year. We have ni = 2 years experience for group 1 and na = 3 for 
group 2. It is immaterial which past years，data we have for policyholder 1, 
so for notational purposes we will choose 


Similarly, 

Then 


mu = 50 and Xu = ― ^ — = 200. 

m 12 = 60 and X 12 = = 216.67. 

o(J 


mi 

文 i 


mn 4* mu = 50 + 60 
10,000+ 13,000 
no 


110 , 


= 209.09. 


For policyholder 2, 


爪 21 = 

m22 = 

m 2 3 = 


100, X 21 = i^ = 180, 

HO, X 22 = = 190.91, 

17,000 
105 


105, X 2S 


161.90. 
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771 2 = m 2 l + m22 + 爪 23 = 100 + 110 + 105 = 315, ' 

x, = 18,000 + 21,000 +17,000 =17778 
2 315 

Now, m = rrii + m 2 = 110 + 315 = 425. The overall mean is 

兵 一 ！- 10,000 + 13,000 + 18,000 + 21,000 +17,000 = 185 88 

The alternative estimate of fj, (16.64) cannot be computed until later. 
Now, 

50(200 - 209.09) 2 + 60(216.67 - 209.09) 2 + 100(180 - 177.78) 2 
+110(190.91 - 177.78) 2 + 105(161.90 - 177.78) 2 
公 = (2-1) + (3-1) 

= 17,837.87 


110(209.09 - 185.88) 2 + 315(177.78 - 185.88) 2 - (17,837.87 ) ⑴ 
' 425 — (110 2 + 315 2 ) /425 
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The credibility premiums are 

0.70(209.09) + 0.30(191.74) = 203.89, 0.87(177.78) + 0.13(191.74) = 179.59. 


The total past credibility premium is 110(203.89) + 315(179.59) = 78,998.75. 
Except for rounding error, this matches the actual total losses of 79,000. □ 

The above analysis assumes that the parameters ^ v, and a are all unknown 
and need to be estimated, and this may not always be the case. Also, it is 
assumed that rii > 1 and r > 1. If = 1, so that there is only one exposure 
unit’s experience for policyholder i, it is difficult to obtain information on 
the process variance v(0i) and thus v. Similarly, if r = 1, there is only one 
policyholder and it is difficult to obtain information on the variance of the 
hypothetical means a. In these situations, stronger assumptions are needed 
such as knowledge of one or more of the parameters (e.g., the pure premium 
or manual rate fi^ discussed below) or parametric assumptions which imply 
functional relationships between the parameters (discussed in Sections 16.5.2 
and 16.5.3). 

To illustrate these ideas, suppose, for example, that the manual rate 〆 
maybe already known, but estimates of a and v may be needed. In that case, 
(16.61) can still be used to estimate ?; as it is unbiased whether fi is known 
or not. (Why is [e^x - fj,) 2 ] jrii not unbiased for v in this case?) 

Similarly, (16.63) is still an unbiased estimator for a. However, if " is known, 
an alternative unbiased estimator for a is 

where v is given by (16.61). To see this, note that 

邮） = 亡罡罚 (兄⑼ 

z=l 771 m 

=y^ — Var(Xi) — —v 


E m v \ r 

— a -- v = a. 

i==1 m \ mi J m 

If there are data on only one policyholder, an approach like this is necessary. 
Clearly, (16.60) provides an estimator for v based on data from policyholder 
i alone, and an unbiased estimator* for a based on data from policyholder i 
alone is 

= ，歹 ..、 2 负 ，歹 、 2 

ai = ( Xi — 〆 ） - — ， 
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which is unbiased because E[(Xi — fj) 2 ] = Vax(Xi) = a-\-v/rrii and E(?)i) = v. 

Example 16.37 For a group policyholder, we have the following data avail¬ 
able: 



Year 1 

Year 2 

Year 3 

Total claims 

60,000 

70,000 

— 

No. in group 

125 

150 

200 


If the manual rate per person is 500 per year, estimate the total credibility 
premium for year 3. 

In the above notation, we have (assuming for notational purposes that this 
group is policyholder i) mu = 125, Xu = 60,000/125 = 480, rrii 2 = 150, 
X i2 = 70,000/150 = 466.67, rm = mn + m i2 = 275, and Xi = (60,000 + 
70,000)/275 = 472.73. Then 

A 125(480 — 472.73) 2 + 150(466.67 - 472.73) 2 

Vi = —— ^ -- = 12,115.15, 

- and with/z = 500, ^ = (472.73 - 500) 2 - (12,115.15/275) = 699.60. We then 
estimate k by Vi/ai = 17.32. The estimated credibility factor is mi/(mi 4- 
Vi/di) = 275/(275 + 17.32) = 0.94 The estimated credibility premium per 
person is then 0.94(472.73) + 0.06(500) = 474.37 and the estimated total 
credibility premium for year 3 is 200(474.37) = 94,874. □ 

It is instructive to note that estimation of the parameters a and v based on 
data from a single policyholder (as in the example above) is not advised unless 
there is no alternative because the estimators Vi and ai have high variability. 
In particular，we axe effectively estimating a from one observation (Xi). It is 
strongly suggested that an attempt be made to obtain more data. 


16.5.2 Semiparametric estimation 

In some situations it may be reasonable to assume a parametric form for the 
conditional distribution fxij\@{ x ij\^i)- The situation at hand may suggest 
that such an assumption is reasonable or prior information may imply its 
appropriateness. 

For example, in dealing with numbers of claims，it may be reasonable 
to assume that the number of claims rnijXij for policyholder i in year j 
is Poisson distributed with mean rriijOi given ©i = 6i ，Thus E(niijXij\Qi )= 
VQx(mijXij\Qi) = mijQi, implying that = v(0i) = ©i and so fi== v 
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Example 16.38 In the past year, the distribution of automobile insurance 
policyholders by number of claims is given below. 



For each policyholder，obtain a credibility estimate for the number of claims 
next year based on the past year’s experience, assuming a (conditional) Pois¬ 
son distribution of number of claims for each policyholder. 

Assume that we have r = 1,875 policyholders, = 1 year experience on 
each，and exposures = 1. For policyholder % (where i = 1 ， … ， 1,875) 
assume that = 0^ is Poisson distributed with mean 0i so that fx(6i )= 

v(0i) = 8i and fi = v. As in Example 16.34 ， 



Vai(X n ) = VaxlECXaiei^+EtVarCXalQO] 

=Vax[/i(0i)] + E[v(0i)] = a-\-v = a + fi. 


Thus an unbiased estimator of a + 1 ; is the sample variance 

一 1,563(0 - 0.194) 2 + 271(1 - 0.194) 2 

Eiii 5 {Xn -X) — +32(2 - 0.194) 2 +7(3 - 0.194) 2 + 2(4 — 0.194) 2 

W4 = W4 

= 0.226. 


Thus a = 0.226 — 0.194 = 0.032 and k = 0.194/0.032 = 6.06 and the credibility 
factor Z is 1/(1 + 6.06) = 0.14. The estimated credibility premium for the 
number of claims for each policyholder is (0.14) 不 1 + (0.86)(0.194)，where Xu 
is 0, 1 ， 2, 3, or 4, depending on the policyholder. □ 
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Example 16.39 Suppose we are interested in the probability that an individ¬ 
ual in a group makes a claim (e.g., group life insurance), and the probability 
is believed to vary by policyholder. Then niijXij could represent the number 
of the rrtij individuals in year j for policyholder i who made a claim. Develop 
a credibility model for this situation. 

If the claim probability is 6i for policyholder then a reasonable model to 
describe this effect is that rriijXij is binomially distributed with parameters 
iriij and given ©i = Oi. Then 

E(rriijXij\Qi) = rriijQi and Vax(mijXij\Qi) = ©i(l - 0i) 
and so = ©i with. v(0i) = ©i(l — ㊀ i). Thus 
M = E(0i), v = fi-E[(Qi)% 

a = Vax(0i) = E[(0i) 2 ] — fi 2 = fi — v - p 2 . □ 

In these examples there is a functional relationship between the parameters 
/i, v, and a which follows from the parametric assumptions made, and this 
often facilitates estimation of parameters. 

16.5.3 Parametric estimation 

If fully parametric assumptions are made with respect to fxij\e{ x ij\^i) 

7r(0i) for 2 = and j = 1 ， •.. ， rii，then the full battery of parametric 

estimation techniques are available in addition to the nonparametric methods 
discussed earlier. In particular, maxunuin likeliliood estimation is straight¬ 
forward (at least in principle) and is now discussed. For policyholder z, the 
joint density of Xi = {Xu,.. ^X ini ) T is, by conditioning on ©u given for 
i = 1， ••• ， r by 

h^i)= [ ( 句叫） tt ⑹祇 . (16.65) 

J [j=l _ 

The likelihood function is given by 

r 

L = Y[f Xi {ya). (16.66) 

2=1 

Maximum likelihood estimators of the parameters are then chosen to maximize 
L or equivalently lnL. 

Example 16.40 *45 a simple example, suppose that rii — n for i = 1， •. • ， r 
and rriij = 1. Let Xij]Qi be Poisson distributed with mean ©i, that is, 
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aep< 


z ’⑻ = ~jr r+El n = +^- (-分 

The maximum likelihood estimator /i of ^ is found by setting V{p) = 0, which 
yields 

r r + EUE^i^ 

ft _ /i(/xn + 1) 


i r n 

fLn + l = l + -J2J2 x H 


A=-EE^i- 

But this is the same as the nonparametric estimate obtained in Example 16.34. 
An explanation is in order. We have = 6i by the Poisson assumption 
and so E[^(0i)] =E (㊀ i)，which is the same /x as was used in the exponential 
distribution. 7r(0i). 

Furthermore, v(6i) = 6i as well (by the Poisson assumption), and so v = 
E[v(0i)]=Also, a = Var[/i(0i)] = Var(0i) = /x 2 by the exponential 
assumption for 丌 ( 心 ). Thus the maximum likelihood estimators of v and a 
axe ft and jl 2 by the invariance of maximum likelihood estimation under a 
parameter transformation. Similarly, the maximum likelihood estimators of 
k = v/a, the credibility factor Z, and the credibility premium ZXi + (1 — Z)fi 
axe k = pT l = X -1 , Z = n/(7i + pT 1 ), and ZXi + (1 — 旬 A ， respectively. 
We mention also that credibility is exact in this model so that the Bayesian 
premium is equal to the credibility premium. □ 

Example 16.41 Suppose that rii = n for all i and rrtij = 1. Assume that 
XijlQi^NiQuv), 


= (2th;) 一 1/2 exp 


一 OO < Xij < oo, 


and Qi 〜 so that 

Tr ( 0 i ) = (2一一 1/2 exp [-士 (氏一 ") 2 j ， 一 00 < < °°- 

Determine the maximum likelihood estimators of the parameters. 

We have fi(0i) = 9i and v(0i) = v. Thus fjb = E[/x(0i)], v — E[v(©i)], and 
a = Var[^(0i)], consistent with, previous use of /x, v, and a. We shall now 


Xi = n" 1 Conditional on 0j, the are independent N(Q{ 1 v) 

random variables, implying that 兄 |㊀ i 〜 N(Q^v/n). Because ©i 〜 N(ji, a), 
it follows from Example 4.30 that unconditionally 文 i ~ N(jjb 1 a+v/n'y Hence 
the density of Xi is, with w = a-v/n 、 

f(xi) = (27rw)^ 2 exp [ 一士闲 —m) 2 | ? 一 00 < < oo. 

On the other hand, by conditioning on ㊀ “ we have 

/(®i) = J (27ro/7i) -1/2 exp [- 垚(至 i_0i) 2 ] 
x (2ira)~ 1/2 exp [-士 的 - 卩 ) 2 ] 祇 . 

Ignoring terms not involving v, or a, this means that f(xi) is proportional 
to 

v~ 1/2 a~ 1/2 J exp - Qif — 士 ( 氏 -p) 2 ) 祇 . 

Now (16.65) yields 

/( Xi ) = 乂二{只(2— - 1/2 exp [— 去(印一叫卜抓)- 1 / 2 

x exp [-士 (& _ M) 2 | d0 u 

which is proportional to 

v~ n/2 a~ 1/2 J exp J2( X H - 0 i) 2 - ^( 0 i - M) 2 d0 i- 

Now use the identity (16.10) restated as 

Y^(xij - 9i) 2 = - Xif + n{xi - 6i) 2 , 

j_=i i=i 

which means that /(x《）is proportional to 



— Xi) 2 4* n(xi — 0i) 2 


2a 


(9i- 


M) 2 




606 CREDIBILITY 


itself proportional to 

v -(n-l)/2 exp -$i) 2 f{Xi) 

_ i=i 

using the second expression for the density f(xi) of Xi given above. Then 
(16.66) yields 

Lee v~^ n - 1)/2 exp X](a：ii - 5i) 2 ] f[f{xi). 


parameter transformation and use fi^ v, and w = a- {- v/n 
and a. This means that 

iocLi(^)L 2 (/i,i£；) } 


Ll(v) = ^-r(n~l)/2 e3 ^ | v- 广恥 ): 



exactly the nonparametric unbiased estimators in the Btihlmann model of 
Example 16.34. The maximum likelihood estimator a is almost the same as 
the nonparametric unbiased estimator, the only difference being the divisor r 
rather than r 一 1 in the first term. □ 

16.5.4 Notes and References 

In this section a simple approach was employed to find parameter estimates. 
No attempt was made to find optimum estimators in the sense of minimum 
variance. A good deal of research has been done on this problem. See 
Goovaerts and Hoogstad [46] for more details and jFurther references. 

16.5.5 Exercises 

16.60 Past claims data on a portfolio of policyholders are given in Table 16.8. 
Estimate the BtiMmaim credibility premium for each of the three policy¬ 
holders for year 4. 

16.61 Past data on a portfolio of group policyholders are given in Table 16.9. 
Estimate the Buhlmann-Straub credibility premiums to be charged to each 

group in year 4. 

16.62 For the situation in Exercise 16.9, estimate the Bfihlmaim credibility 
premium for the next year for the policyholder. 


Table 16.8 Data for Exercise 16.60 
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Table 16.9 

Data for Exer 

zise 16.61 






Year 




Policyholder 

1 

2 

3 

4 

Claims 

No. in group 

1 

- 

20,000 

100 

25,000 

120 

110 

Claims 

No. in group 

2 

19,000 

90 

18,000 

75 

17,000 

70 

60 

Claims 

No. in group 

3 

26,000 

150 

30,000 

175 

35,000 

180 

200 


16.63 Consider the Biihlmaim model in Example 16.34. 

(a) Prove that Vai(Xij) = a + v. 

(b) If {Xij •• i = 1， … ， r and j = 1 ， … ， n} are unconditionally inde¬ 
pendent for all i and j, argue that an unbiased estimator of a + v 
is 


- 尤 ) 2 


(c) Prove the algebraic identity 


E -^) 2 = E T,{X ij -X i f+n 


(d) Show that, conditionally, 


53( 和 - 旬 2 = (t; + a) — 


(e) Comment on the implications of (b) and (d), 

16.64 The distribution of automobile insurance policyholders by number of 
claims is given in Table 16.10. 

Assuming a (conditional) Poisson distribution for the number of claims per 
policyholder, estimate the Btihlmann credibility premiums for the number of 
claims next year. 

16.65 Suppose that, given 0, Xi, … ， X n are independently geometrically 
distributed with pf 


fxj\e(xj\6)= 
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Table 16.10 Data for Exercise 16.64 


No. of claims 

No. of insureds 

0 

2,500 

1 

250 

2 

30 

3 

5 

4 

2 

Total 

2,787 


⑷ Show that ii{0) — 0 and v(0) = 0(1 + 0). 

(b) Prove that a = v — 一 (jl 2 • 

(c) Rework Exercise 16.64 assuming a (conditional) geometric distrib¬ 
ution. 

16.66 Suppose that 

and 

= q . > o. 

Write down the equation satisfied by the maximum likelihood estimator fi of 
fi for Bfihlmann-Straub-type data. 

16.67 (a) Prove the algebraic identity 

i=l i=l i=l i=l 

(b) Use part (a) and (16.61) to show that (16.63) may be expressed as 



16.68 (*) A group of 340 insureds in a high-crime area submit the 210 theft 
claims in a one-year period as given in Table 16.11. 
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Table 16.11 Data for Exercise 16.68 


Number of claims 

Number of insureds 

0 

200 

1 

80 

2 

50 

3 

10 


asured is assumed to have a Poisson distribution for the number of 
it the mean of such a distribution may vary from one insured to 
If a particular insured experienced two claims in the observation 
jtermine the Buhlmann credibility estimate for the number of claims 
Lsured in the next period. 
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BASICS OF SIMULATION 



Ripley [110], and Ross [115] will provide many important additional insights. 
In addition, simulation can also be an aid in evaluating some of the statistical 
techniques covered in earlier chapters. This will also be covered here with an 
emphasis on the bootstrap method. 

17.1.1 The simulation approach 

The beauty of simulation is that once the model is created little additional 
creative thought is required. 1 The entire process can be summarized in four 
steps, where the goal is to determine values relating to the distribution of a 
random vaxiable 5. 

1. Build a model for S which depends on random variables X ， H", 
where their distributions and any dependencies axe known. 

2. For j = 1, … ， n generate pseudorandom values Xj ， yj, Zj ^... and then 
compute Sj using the model from step 1. 

3. The cdf of S may be approximated by ■Fn(s)，the empirical cdf based on 
the pseudorandom sample si,..., s n . 

- 4. Compute quantities of interest, such as the mean, variance, percentiles, 
or probabilities, using the empirical cdf. 

Two questions remain. First, what does it mean to generate a pseudoran¬ 
dom variable? Consider a random variable X with cdf Fx[x). This is the 
real random vaxiable produced by some phenomenon of interest. For example, 
it may be the result of the experiment “collect one automobile bodily injury 
medical payment at random and record its value.” We assume that the cdf is 
known. For example, it may be the Pareto cdf, 办 ⑻ =1 - ^ * Now 

consider a second random variable, X*, resulting from some other process, but 
with the same Pareto distribution. A random sample from X *，say ref,... ,rc*, 
would be impossible to distinguish from one taken from X. That is, given the 
n numbers, we could not tell if they arose from automobile claims or something 
else. This means that, instead of learning about X by observing automobile 
claims, we could learn about it by observing X*. Obtaining a random sam¬ 
ple from a Pareto distribution is still probably difficult, so we have not yet 
accomplished much. 

We can make some progress by making a concession. Let us accept as a 
replacement for a random sample from X* a sequence of numbers xj*, … ， x** 
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may not be independent, or even random, but was generated by some known 
process that is related to the random variable X*. Such a sequence is called 
a pseudorandom sequence because anyone who did not know its origin could 
not distinguish it ^:om a random sample from X* (and therefore from X). 
bucli a sequence will be satisfactory for our purposes. 

The field of developing processes for generating pseudorandom sequences 
P f ambers has been well developed. One fact that makes it easier to do this 
is that it is sufficient to be able to generate such sequences for the uniform 
distribution on the interval (0,1). That is because, ilUhas the uniformfO 1) 
distribution, then Z = F^{U) wiU have F x (x) as its cdf. Therefore, we 
simply obtain uniform pseudorandom numbers , w** and then let x^* == 

This is called the inversion method of generating random variates, 
bpecific methods for particular distributions have been developed but will 
not be discussed here. There is a considerable Uterature on the best ways to 
generate pseudorandom uniform numbers and a variety of tests proposed to 
evaluate them. Readers are cautioned to ensure that one being used is a good 


Example 17.1 Generate 10,000 pseudo-Pareto (with a = 3, and 9 = 1 000) 
vanates and veHfy that they are indist^uiskable from real Pareto observa- 


〒e pseudounifonn values were obtained using the built-in generator sup¬ 
plied TOth a commercial programming language. The pseudo-Pareto values 
are calculated from 


f 1,000 
、 1,000 + x** 


x** = 1 ， 000[(1 — w**) 一 V 3 — 1]. 

So, for example, if the first value generated is wf* = 0.54246, we have x** = 
This was repeated 10,000 times. The results axe displayed inVa- 
ble 17.1, where a chi-square goodness-of-fit test is conducted. The expected 
counts axe calculated using the Pareto distribution with a = 3 and 9 = 1,000. 
Because the parameters are known, there are nine degrees of freedom. At a 
significance level of 5% the critical value is 16.92, and we conclude that the 
pseudorandom sample could have been a random sample from this Pareto 
distribution • 口 




When 


ibution fii] 




















any consistent estimator will be arbitrarily close to the true value with high 
probability as the sample size is increased. In particular，empirical estimators 
have this attribute. 'With a little effort we should be able to determine the 
value of n that will get us as close as we want with a specified probability. 
Often, the central limit theorem will help, as in the following example. 

Example 17.5 (Example 17.1 continued) Use simulation to estimate the 
mean ，Fx (1,000), and tv q .q ，the 90th percentile of the Pareto distribution with 
a = 3 and 9 = 1,000. In each case, stop the simulations when you are 95% 
confident that the answer is within 士 1% o/ the true value. 

In this example we know the values. Here, fi = 500, ■FxdOOO) = 0.875, 
and 7 ro .9 = 1,154.43. For instructional purposes we will behave as if we do not 
know these values. 

The empirical estimate of fi is x. The central limit theorem tells us that 
for a sample of size n 

0.95 = 


nnl . 


Pr(0.99M <X n < 1.01^) 

\ a/y/n ~ afy/n _ ajy/n) 


0.01/x 

a/y/n 


1.96, 


(17.1) 


estimates improve with n, so our stopping rule is to cease simulating when 


F ° r a Particular simulation conducted by the authors, the criterion was met 
wh f n = 伽， 934, at which point x = 501.15, a relative error of 0.23%, well 
within our goal. 

We now turn to the estimation of i^(l,000). The empirical estimator 
is the sample proportion below 1,000, say P n /n, where P n is the number 
below 1,000 after n simulations. The central limit theorem teUs us that P 
k approximately normal with mean F X (1,000) and variance ^(1,000)[1 
Ax-(l,000)]/n. Arguing as above, the requirement will be whpn 



17.2 Use the uniform random 
from 
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17.3 Demonstrate that 0.95 = Pr(l^ < tto .9 < Yb) for Y a and Yb as defined 
in Example 17.5. 

17.4 You are simulating observations from an exponential distribution with 
6 = 100. How many simulations are needed to be 90% certain of being within 
2% of each of the mean and the probability of being below 200? Conduct the 
required number of simulations and note if the 2% goal has been reached. 

17.5 Simulate 1,000 observations from a gamma distribution with a = 2 and 
6 = 500. Perform the chi-square goodness-of-fit and Kolmogorov-Smimov 
tests to see if the simulated values were actually from that distribution. 



17_2.2 Examples of lack of independence or identical distributions 

There axe two common ways to have the assumption fail to hold.( 
through accounting for time (and in particular the time value of mone ， 
the other is through coverage modifications. The latter may have a time 
as well. The following examples provide some illustrations. 


is continued). This is reasonable because the more expensive losses may 
longer to settle. 

Finally, let % be a random variable that represents the value, whic] 
invested today, will accuiimlate to 1 in t years. It is independent of aU 
Cj y and Lj. But clearly, for s ^t^V s and Vt axe dependent. 

We then have 

i=i 

where N == max Cj< i{j}. The various dependencies were established in 
development of the random variables. 

Example 17.7 (Out-of-pocket maximum) Suppose there is a deductible 
























T - 
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imposing the out-of-pocket maximum. Then the amount actually paid by the 

policyholder is Ru = RAu. Let S = - h Xn be the total losses, and 

then the aggregate amount paid by the insurer is T — S — Ru* Note that 
the distributions of 5 and Ru are based on i.i.d. severity distributions. The 
analytic methods described earlier can be used to obtain their distributions. 
But because they axe dependent, their individual distributions cannot be com- 
biaed to produce the distribution of T, There is also no way to write T as a 
random sum of i.i.d. variables. At the beginning of the year, it appears that 
T will be the sum of i.i.d. but at some point the may be replaced by 
XjS as the out-of-pocket maximum is reached. □ 

17.2.3 Simulation analysis of the two examples 

We now complete the two examples using the simulation approach. The mod¬ 
els have been selected arbitrarily，but we should assume they were determined 
by a careful estimation process using the techniques presented earlier in this 
text. 

Example 17.8 (Example 17.6 continued) The model is completed with the 
following specifications. The amount of a payment (Xj) has the Pareto dis¬ 
tribution with parameters a == 3 and 6 = 1,000. The time from the occurrence 
■ of a claim to its payment (Lj) has a Weibull distribution with r = 1.5 and 
0 = \n(Xj)/6. This models the dependence by having the scale parameter de¬ 
pend on the size of the loss. The discount factor will be modeled by assuming 
that, for t > s } []n(V s /Vt)\/(t — s) has a normal distribution with mean 0.06 
and variance 0.0004(t — s). We do not need to specify a model for the number 
of losses. Instead，we use the model given earlier for the time between losses. 
Use simulation to determine the expected present value of aggregate payments. 

The mechanics of a single simulation will be done in detail, and that should 
indicate how the process is to be done. Begin by generating i.i.d. exponential 
interloss times until their sum exceeds 1 (in order to obtain one year’s worth of 
claims). The individual variates are generated from pseudoimiform numbers 
using 


which yields 


x = — 0.21n(l — u). 


For the first simulation, the uniform pseudorandom numbers and the corre¬ 
sponding x values axe (0.25373, 0.0585), (0.46750, 0.1260) ， (0.23709, 0.0541), 
(0.75780, 0.2836), and (0.96642 ， 0.6788). At this point the simulated xs total 
1.2010 and therefore there are four loss events, occurring at times c\ = 0.0585, 
c 2 = 0.1845, C 3 = 0.2386, and C 4 = 0.5222. 

The four loss amounts axe found from inverting the Pareto cdf. That is, 


1,000[(1 - u) 


-i/s 


1]. 
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The four pseudoimiform numbers axe 0.71786, 0.47779, 0.61084, and 0.68579. 
This produces the four losses x± = 524.68, x 2 = 241.80, x 3 = 369.70, and 
X 4 = 470.93. 

The times from occurrence to payment have a Weibull distribution. The 
equation to solve is 

W=1 一 e ， /ln(aO 】 15 ， 

where x is the loss. Solving for the lag time l yields 

Z = iln(x)[-ln(l- U )] 2 /3. 

For the first lag we have u = 0.23376 and so 

h = ^ln(524.68)[-ln0.76624] 2 / 3 = 0.4320. 

Similarly, with the next three values of u being 0.85799, 0.12951, and 0.72085, 
we have I 2 = 1.4286, Z 3 = 0.2640, and Z 4 = 1.2068. The payment times of 
the four losses axe the sum of cj and namely t ± = 0.4905, t 2 = 1.6131 ， 
t 3 = 0.5026, and t 4 = 1.7290. ^ 

Finally, we generate the discount factors. They must be generated in order 
of increasing tj so we first obtain ? ； o.4905 - We begin with a normal variate 
witli mean 0.06 and variance 0.0004(0.4905) = 0.0001962. Using inversion ， 
the simulated value is 0.0592 = [ln(l/z; o . 49O5 )]/0.4905 and so 卯 . 4 905 = 0.9714. 
Note that for the first value we have s = 0, and vq = 1 . For the second 
value we require a normal, variate with mean 0.06 and variance (0.5026 - 
0.4905)(0.0004) = 0.00000484. The simulated value is 

0.0604 = 叫 0 . 9 ^^ 0 . 邱 26) for vo. 5 o 26 = 0-9707. 

For the next two payments, we have 

0-0768 = j ? 避 【^^ 1 . 6131 ) for ui . 6131 = 0.8913, 

0.0628 = —^gg 1 ' 729 ^ for ^ 1.7290 = 0.8848. 

We are now ready to determine the first simulated value of the aggregate 
present value. It is 

si = 524.68(0.9714) + 241.80(0.8913) -f 369.70(0.9707) + 470.93(0.8848) 
=1,500.74. 

The process was then repeated until there was 95% confidence that the esti¬ 
mated mean was within 1 % of the true mean. This took 26,944 simulations, 
producing a sample mean of 2,299.16. □ 

Example 17.9 (Example 17.7 continued) For this example, set the deductible 
d at 250 and the out-of-pocket maximum at u = 1,000. Assume that the 
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Table 17.2 Negative binomial cumulative probabilities 


n 

Fn{ti) 

n 

Fn(ti) 

0 

0.03704 

8 

0.76589 

1 

0.11111 

9 

0.81888 

2 

0.20988 

10 

0.86127 

3 

0.31962 

11 

0.89467 

4 

0.42936 

12 

0.92064 

5 

0.53178 

13 

0.94062 

6 

0.62282 

14 

0.95585 

7 - 

0.70086 

15 

0.96735 


number of losses has the negative binomial distribution with r = 3 and P = 2. 
Further assume that individual losses have the Weibull distribution with r = 2 
and 0 = 600. Determine the 95th percentile of the insurers losses. 

In order to simulate the negative binomial claim counts, we require the cdf 
of the negative binomial distribution. There is no closed form, but a table 
can be constructed, and one appears here as Table 17.2. The number of losses 
for the year is generated by obtaining one pseudouniform value ■— for example, 
u = 0.47515 — and then determining the smallest entry in the table that is 
larger than 0.47515. The simulated value appears to its left. In this case our 
first simulation produced n = 5 losses. 

The amounts of the five losses are obtained from the Weibull distribution. 
Inversion, of the cdf produces 

x = 600 [— ln(l — tz)] 1 / 2 . 

The five simulated values are 544.04, 453.67, 217.87, 681.98, and 449.83. The 
total loss is 2,347.39. The policyholder pays 250.00+250.00+217.87+250.00+ 
250.00 = 1,217.87, but the out-of-pocket maximum limits this to 1,000. Thus 
our first simulated value has the insurer paying 1,347.39. 

The goal was set to be 95% confident that the estimated 95th percentile 
would be within 2% of the true value. This required 11,476 simulations, 
producing an estimated 95th percentile of 6,668.18. □ 
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Example 17.10 It is conjectured that losses have a lognormal distribution. 
One hundred observations have been collected and the Kolmogorov - Smirnov 
t es t statistic is 0.06272♦ Determine the p-value for this test，first with the null 
hypothesis being that the distribution is lognormal with 〆 = 7 and a = 1 and 
then with the parameters unspecified. 

For the null hypothesis with each parameter specified, one simulation in¬ 
volves first simulating 100 lognormal observations jErom the specified lognormal 
distribution. Then the Kolmogorov — Smirnov test statistic is calculated. The 
estimated p-value is the proportion of simulations for which the test statistic 
exceeds 0.06272. After 1000 simulations, the estimate of the p-value is 0.836. 

With the parameters unspecified, it is not clear which lognormal distribu- 
ti° n should be used. It turned out that for the observations actually collected 
A = 7.2201 and a = 0.80893. These were used as the basis for each simu- 
lation. The only change is that after the simulated observations have been 
obtained, the results are compared to a lognormal distribution with paxame- 
ters estimated (by maximum likelihood) from the simulated data set. For 
1,000 simulations, the test statistic exceeded 0.06272 491 times, for an esti¬ 
mated p-yalue of 0.491. 

As indicated in Section 13.4.1, not specifying the parameters makes a con¬ 
siderable difference in the interpretation of the test statistic. □ 

When testing hypotheses, p-values and significance levels are calculated 
assuming the null hypothesis to be true. In other situations, there is no 
known population distribution from which to simulate. For such situations, a 

technique called the bootstrap (see [33] for thorough coverage of this subject) 

may help. The key is to use the empirical distribution from the data as 
the population from which to simulate values. Theoretical arguments show 
that at least asymptotically the bootstrap estimate will converge to the true 
value. This is reasonable because as the sample size increases the empirical 
distribution becomes more and more like the true distribution. The following 
example shows how the bootstrap works and also indicates that, at least in 
the case illustrated, it gives a reasonable answer. 

Example 17.11 A sample (with replacement) of size 3 from a population 
produced the values 2, 3, f and 7. Determine the bootstrap estimate of the 
mean-squared error of the sample mean as an estimator of the population 
mean. 

The bootstrap approach assumes that the population places probability | 
on each of the three values 2, 3, and 7. The mean of this distribution is 4. 
Prom this population there axe 27 samples of size 3 that might be drawn. 
Sample means can be 2 (sample values 2,2,2, with probability 奏)， ！ (sample 
values 2,2,3, 2,3,2, and 3,2,2, with probability ^), and so on, up to 7 with 
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probability 每 . The mean-squared error is 

(2 — 4) 2 (l/27) + (i — 4) 2 (3/27) + • • • + (7 — 4) 2 — 14 . 

— 

The usual approach is to note that the sample mean is unbiased and therefore 
MSE(X) = Vbx(X) = a 2 /n. 

With the variance unknown, a reasonable choice is to use the sample variance. 
With a denominator of n, for this example, the estimated mean-squared error 

iS 去 [(2 — 4) 2 + (3 — 4) 2 + (7 — 4) 2 ] — 14 

- =—, 

the same as the bootstrap estimate. □ 

In many situations, determination of the mean-squared error is not so easy, 
and then the bootstrap becomes an extremely useful tool. While simulation 
was not needed for the example, note that an original sample size of 3 led to 
27 possible bootstrap values. Once the sample size gets beyond 6, it becomes 
impractical to enumerate all the cases. In that case, simulating observations 
from the empirical distribution becomes the only feasible choice. 

Example 17.12 In Example 11.3 an empirical model for time to death was 
obtained. The empirical probabilities are 0.0333, 0.0744,0.0343, 0.0660, 0.0344, 
and 0.0361 that death is at times 0.8, 2.9, 3.1, 4.0, 4.1, and 4.8 respectively. 
The remaining 0.7215 probability is that the person will be alive five years 
from now. The expected present value for a five-year term insurance policy 
that pays 1,000 at the moment of death is estimated as 

1000(0.0333v°* 8 + … + 0.0361v 4 . 8 ) = 223.01, 

where v = 1.07 — 、 Simulate 10,000 bootstrap samples to estimate the mean- 
squared error of this estimator. 

A method for conducting a bootstrap simulation with the Kaplan-Meier 
estimate is given by Efron [31]. Rather than simulate from the empirical dis¬ 
tribution (as given by the Kaplan-Meier estimate), simulate from the original 
sample. In this example, that means assigning probability ^ to each of the 
original observations. Then each bootstrap observation is a left-truncation, 
point along with the accompanying censored or uncensored value. After 40 
such observations axe recorded, the Kaplan-Meier estimate is constructed 
from the bootstrap sample and then the quantity of interest computed. This 
is relatively easy because the bootstrap estimate can place probability only 
at the six original points. Ten thousand simulations were quickly done. The 
mean was 222.05 and the mean-squared error was 4,119. Efron also noted 
that the bootstrap estimate of the variance of S(t) is asymptotically equal to 
Greenwoods estimate, thus giving credence to both methiods. 口 
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17.13 A sample of three items from the uniform(0, 9) distribution produced 
the following values: 2, 4, and 7. Consider the estimator of 0, 

0 = |max(a ； i ， :c2,23)* 

From example 9.15 the mean-squared error of this unbiased estimator was 
shown to be 0 2 /15. 

(a) Estimate the mean-squared error by replacing 0 with its estimate. 

(b) Obtain the bootstrap estimate of the variance of the estimator. (It 
is not possible to use the bootstrap to estimate the mean-squared 
error because you cannot obtain the true value of 8 from the em¬ 
pirical distribution, but you can obtain the expected value of the 


Appendix A 
An inventory of 
continuous distributions 


A.1 INTRODUCTION 


Descriptions of the models are given below. First a few mathematical prelimi- 
naries are presented that indicate how the various quantities can be computed. 
The incomplete gamma function 1 is given by 


忐 / Vl 


dt, a > 0. a: > 0 


withT(a)= / t^e-* dt, a > 0. 


1 r S JfLi e tt e J lCeS, SUCh ^ i 3 )' denote this integral P(a,x) and define r( a ， a;)= 
Jx. 兔 ^ 也： Note that this definition does not normalize by dividing by r(a). When 
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Also, define 


G(a'x)= t^e^dt, x>0. 


At times we will need this integral for nonpositive values of a. Integration by 
parts produces the relationship 


G{a\x )= - 




+ —G(a + 1;®). 


This can be repeated until the first argument of Gisa-ffe, a positive number. 
Then it can be evaluated from 


G(a + fc; x) = r(a + fc)[l — r(o ： + fc; x)]. 

However, if a is a negative integer or zero, the value of G(0;x) is needed. It 


is 


1 e"" t dt = Ei(x)^ 


G(0;x) = 

Jx 

which is called the exponential integral. A series expansion for this integral 
is 

E^x) = -0.57721566490153 


y. (-l) n x n 
n(n\) • 

. n=l 、 7 

When a is a positive integer, the incomplete gamma function can be eval¬ 
uated exactly as given in the following theorem. 


Theorem A.l For integer a ， 


T(a; x) 


E 


x^e~ x 


Proof: For a = 1 ， T{l\x) = / 0 X dt = 1 — e" 21 , and so the theorem is 
true for this case. The proof is completed by induction. Assume it is true for 


The incomplete beta function is given by 


P(a,b-,x) = $((:);(:)) J a>0, b>0, 0 <x < 

and when 6 < 0 (but a > 1 + [—6J), repeated integration by parts prc 



must be positive (that is, a —r —1 > 0). 

Numerical approximations for both the incomplete gai 
plete beta function are available in many statistical con 
well as in many spreadsheets because they are just the d: 
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TRANSFORMED BETA FAMILY 631 


where Z has the standard normal distribution. Then, for 2 ; > 0, $(z)= 
0.5 + r(0‘5; z 2 /2)/2 while, for z < 0, #(z) = 1 — $(—z). 

The incomplete beta function can be evaluated by the series expansion ' 


b] x )= 


r(a + &)rc a (l — x) b 


ar(a)r(6) 




(a + 6) (a + 6 +1) • • • (ct + 6 + 71 ) 
(a + l)(a + 2) (a + n +1)" 


The gamma function itself can be found from 


lnr(a) = (a — lna — a + 


ln( 27 r) 


+ 12^ _ 360a 3 + 1,260a 5 _ 1,680a： 7 + 1,188a 9 _ 360,360a 11 
1 3,617 43,867 174,611 

+ 156a 13 — 122,400a 15 + 244,188a 17 _ 125,400a 19 ' 

For values of a above 10 the error is less than 10 一 19 . For values below 10 use 
the relationship 

- lnr(a) = lnr(a + 1) — lna. 

The distributions are presented in the following way. First the name is given 
along with the parameters. Many of the distributions have other names, which 
are noted in parentheses. Next the density function f(x) and distribution 
function F(x) are given. For some distributions, formulas for starting values 
are given. Within each family the distributions axe presented in decreasing 
order with regard to the number of parameters. The Greek letters used are 
selected to be consistent. Any Greek letter that is not used in the distribution 
means that that distribution is a special case of one with more parameters 
but with the missing parameters set equal to 1. Unless specifically indicated, 
all parameters must be positive. 

Except for two distributions, inflation can be recognized by simply inflating 
the scale parameter 9. That is, if X has a particular distribution, then cX 
has the same distribution type, with all parameters unchanged except 0 is 
changed to c6. For the lognormal distribution, \i changes to /i + ln(c) with or 
unchanged, while for the inverse Gaussian both ji and 6 are multiplied by c. 

For several of the distributions, starting values axe suggested. They are not 
necessarily good estimators, just places from which to start an iterative proce¬ 
dure to maximize the likelihood or other objective function. These axe found 
by either the methods of moments or percentile matching. The quantities 
used are: 

1 n 1 n 

Moments: m = — Xi, t = 一 xf, 


Percentile matching ： p= 25th percentile, g = 75th percentile. 

For. pouped data or data that have been truncated or censored, these 
quantities may have to be approximated. Because the purpose is to obtain 
starting values and not a useful estimate, it is often sufficient to just ignore 
modifications. For three- and four-parameter distributions, starting values 
can be obtained by using estimates from a special case, then maVmg the new 
parameters equal to 1. An all-purpose starting value rule (for when all else 
fails) is to set the scale parameter (0) equal to the mean and set all other 
parameters equal to 2. 

A^l the distributions listed here (and many more) are discussed in great 
detail in [73]. In many cases, alternatives to maximum likelihood estimators 
are presented. 


A.2 TRANSFORMED BETA FAMILY 


A.2.1 Four-parameter distribution 

A2. J J Transformed beta—a, 9, 7, r (generalized beta of the second kind 
Pearson Type VI) 


/ ⑷ 
作 ) 


E[X k ] 


E[pfW] 


Mode 


r(a + r) 7 (rc/g) 7r 
r(a)r ⑺ x[l -f- (x/0)y] Oi + T 5 

0 ( 丁 a -. u ) u= - ( X /^) J 

~ ，， ），" 1 + (2 ； 卽， 

e k T{r + kh)T{a - kh) 


r ⑷ r( r ) 

= d^iT + k/^Tja-k, 
— ~~r(a)r(r) 
+^[1-^(0；)], k> 

=e(n^l\ h 


一 T 7 < k < a 7 , 




+ fe/7,a ~&/ 7 ； u) 




T 7 > 1, else 0. 


A_2.2 Three-parameter distributions 

A2 - 2 -l Generalized Pareto—a, d, r (beta of the second kind) 


/ ⑷ 

F{x) 


_ r(a+ t) e a x T ~ 1 
~ r(a)r(r) (x + ’ 

= 0(r,a-,u), 
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= H.H 12 if integer, 

E[(X A x) k ] = 6 - fc ^( T + k,a-k-,u), 

• +x k [l-F(x)], k>-T, 

丁 一 1 

Mode = 6 - t > 1， else 0. 

a + 1 

A.2.2,2 Burr—a, 0 ,7 (Burr Type XII, Singh-Maddala) 

f(x) « ai(x/9r _ 


F(x) = 1 - u a , u 


E[(X A x) k ] 


1 1 + ( 种 

0 & r(l + k/j)T(a - k/j) 
f{a) 

9 k T(l + fc/7)r(a - fc/7) 

f{a) 

+ x k u a , k > —7 , 

/ 〜 1 \V7 


0(l + k/j ， a-k/y;l-u) 


( 品） ’ 


A.2.2.3 Inverse Burr—^r^ 6 , 7 (Dagum) 

f(x) - rnf(x/9)^ 

— x[l + (x/O)^ 1 ' 

吵卜 U ' U= i+ J ^ 
寧 fc] = ^T(r + W(l- fc / 7 ) 

E[(XA^] = ^ + V7Fd^Z7) 

+ $ fc [l - w T ], k > —r7 
/- 1 \ x h 


/?(T + fc/7，l 一 fc/7;w) 


T7 > 1, else 0. 


m 


F{x )= 

m k ]= 

E[^ fe ]= 

EpTAx) fc ]= 

Mode = 
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A.2.3.3 Loglogistic ~ 7 ,6 (Fisk) 
f(x) - w 调 7 

F(x) — u u - (调 7 
F[x) - u, u ~i + {x/e)^ 

E[x fc ] = 0 fc r(i + fc/ 7 )r(i —fc/ 7 ), - 7 <fe<7, 

E[(X A a;) fe ]= 沪 r(l + fc/ 7 )r(l - fc/ 7 )/3(l + 丸 -/ 7 , 1 — k/ T ,u) 
+ X fe (l - u), fc > — 7 , 

/- 1 \ V7 


喘） 

2 In ⑶ 

ln(g) - ln(p) ’ 


' ln(g) + ln(p ) 、 
2 


A.2.3A Paralogistic ~a, 0 This is a Burr distribution with 7 = 0 ：. 


f(x)=- 
F^)= 

E[X fc ]= 
E[(XAa:) fc ]= 

Mode = 


a 2 (x/0) a 
x[l + (x/d) 0 ]^ 1 


， 1 + ( 调 

d k r(l + k/a)r(a-k/a) 2 

r(a) ’ a<L<a, 

d k T(l + k/a)T(a - k/a) 7 . 7 n 、 

^ ——- ^/?(1 + k/a, a- k/a\l-u) 

~bx k u a , k> —a , 

/ 1 \ 1 / Qf 


Starting values can use estimates from the loglogistic (use 7 for a) or Pareto 
(use a) distributions. 
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TRANSFORMED GAMMA FAMILY 637 


A.3.1.2 Inverse transformed gamma ― a, 0,r (inverse generalized gamma) 
™ a e 一 1 


/ ⑷ 
作) 
E[X fc ] 

E[(XAx) k ] 


Mode 


u = (e/x) T , 


a ： r(a) 

1 -r(a;u), 

6 k T(a - k/r ), 

— n — .， k<aT, 

6kT %^f /T) 丨 1 - W - 听; 刈 + A ( ㈣ ) 

0 k G(a - k/r; u) 

^ 


+ x fe r(a; u) 7 all fe, 


i/r 


A.3.2 Two-parameter distributions 

A.3.2.1 Gamma — a, 0 

.— { x /ere-^ 

- — sr ㈤’ 

F{x) = r(a; x/6). 


E[X fe ]= 


e k T{a + k) 

r(a) 1 


k > 一 a, 


E[X k ] = O k {a-\-k — l)--*o ： if fe is an integer 


E[(XAa;) fe ] = - ^^y ---r(a + k; x/6) + x k [l - T(a; x/6)\, k>-a 

E[(XAs) fc ] = a(a + l)---(a + k- l)0 k T(a + h,x/6) 

+ x fe [l — r(a; x/6)] if fc is an integer, 

M(t) = (1 -㈨ 一 Q ，t < 1/0, 

Mode = 6(a — 1 )， a> 1, else 0 ， 

. m 2 ^ t-m 2 

a = 9= -^r- 

A.3.2.2 Inverse gamma — a, 6 (Vinci) 


{e/x) a e- B / x 

~"if(a)~ 

F{x) = l-T{a-,e/x) 


fix) 


E[X fc ] : 

m k ] : 

E[(XAx) fc ] : 

Mode : 
a : 

A.3.2.3 Weibull—e^ 


9 k T{a - k) 

r ⑷ 1 


k <a, 


if A; is an integer, 


(a — 1) ••• (a — fc) 

- r ( a - ^)]+ ^-,e/x) 
'e k G(a-k ； e/x) 

0/(a + l), 

2t — m 2 2 , _ 
t — m 2 ' t — m 2 ' 


+ a: fc r(a; 9/x), all k, 


r{x/6) T e- 


H x ) ^ 

F(x) = 

E[X fe ] = 6 k T(l + k/T), k > -t, 

E[(XAa;) fe ] = 0 fe r(l + k/r)Y[l + fc/r; {x/9) T ] + x k e -^, k > -t. 

Mode = 6 ， r > 1, else 0, 

p — / 5 ln(p)-ln(g)\ In(h ⑷） 

^ - 5-1 J ' 卜 Rln _)， 

f = 峰⑷)一 

]n(q) — ln(fl) 

A.3.2.4 Inverse Weibull — 0, r (log-Gompertz) 


m 


r(6/x) r e 




F(x) = e_，) T , 

E[X fc ] = e k r(l - k/r), k<T, 

E[(XAa;) fe ] = 6 k T(l - k/r){l - T[1 - fe/r; {9/x) T \} 

+x k [l-e~^r], 

= e k G[l - k/r; (6/x) T ] + x k [l- e-W#] ， all fc, 
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h(ln ⑷） 

ln( 0 ) - ln(p) 


MM 4 )) 

ln(ln(4/3))’ 


A.3.3 One-parameter distributions 

A.3.3.1 Exponential~0 


f(x)= 

作）二 
E[X k ]= 
E[X fe ]= 
E[X Ai]= 
E[(X A x) k ]= 
E[(XAx) k ] m. 
M{t)= 
Mode = 


e -心 
1 - e~ x / e , 

9 k T{k+l), k > -1, 

6 k k\ if fe is an integer, 

9(1- e~ x/e ), 

-9 k T(k + l)r(fc + l-,x/e)+x k e~ x/e , k > -1, 
6 k k\T(k + 1; x/9) + x k e~ x/e if fe > -1 is an integer, 
(1 —fii )— 1 ， t < 1/6, 

0， 

m. 


A.3.3.2 Inverse exponential—0 


/⑷ 

作) 

E[Z fe ] 

E[{XAx) k ] 

Mode 


_ 9e~ e / x 
= 

= e_ e ，' 

= e k T{l - k), k<l, 

=e k G(l - fe; e/x) + x fe (l - e- e/x ), 

= 0/2， 

=-q ln(3/4). 


all fe, 


A.4 OTHER DISTRIBUTIONS 

A.4.1.1 Lognormal—fi,a (fi can be negative) 

f{x) = — exp(-z 2 /2) = 4>{z)/{ax), z = ^-~, 

F{x) = $(2)， 


E[(XAa;) fc ]= 

Mode = 
a = 


exp(fc M + |fcV 2 ) fcff2 ) +X ^[l-F(x)}, 

e xp(y ct 2 )， 

V^ln(t) — 2 ln(m), ft = ln(m) — . 


A.4.1.2 Inverse Gaussian — fi,0 


f{x) 

F(x) 
E[X] 
E[X A a;] 

MW 

A 



=Mi Var[X] = fi 3 /d. 



— 771, 


m 3 

t — m 2 


A.4.1.3 log-t—r,fji,cr (fi can be negativej Let Y have a t distribution 
with r degrees of freedom. Then X = exp(cr Y + /jl) has the \og-t distribution. 


has a heavier tail than the normal distribution, this distribution has a heavier 
tail than the lognormal distribution. 


/⑷ = 

F(x)= 




with F r (t) the cdf of a t distribution with r d.f., 












Appendix B 
An inventory of 
discrete distributions 


B.l INTRODUCTION 

The 16 models fall into three classes. The divisions axe based on the algo¬ 
rithm by which the probabilities are computed. For some of the more familiar 
distributions these formulas will look different from the ones you may have 
learned, but they produce the same probabilities. After each name, the para¬ 

meters axe given. All parameters axe positive unless otherwise indicated. In 
all cases, pk is the probability of observing k losses. 

For finding moments, the most convenient form is to give the factorial 
moments. The jth factorial moment is 〆(，•）= E[iV*(iV* 一 1) … （iV — 彡 + 1)]. 
We have E[iV] = p ⑴ and Vax(JV) = M( 2 ) + M(i) — Mm- 

The estimators which are presented axe not intended to be useful estima¬ 
tors but rather for providing starting values for maximizing the likelihood (or 
other) function. For determining starting values, the following quantities axe 


Loss Models: From Data to Decisions, Second Edition. 

By Stuart A. Klugman, Harry H. Panjer, and Gordon E. Willmot 
ISBN 0«471-2X577-5 Copyright © 2004 John Wiley h Sons, Inc. 
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used [where n/i ； is the observed frequency at k (if, for the last entry ， rep¬ 
resents the number of observations at k or more, assume it was at exactly k) 
and n is the sample size]: 

jx = — krikt a 2 = — y"' k 2 nk — A 2 - 

71 k=i n k=i 

When the method of moments is used to determine the starting value，a 
circumflex (e.g., A) is used. For any other method, a tilde (e.g., A) is used. 
When the starting value formulas do not provide admissible parameter values, 
a truly crude guess is to set the product of all A and P parameters equal to 
the sample mean and set all other parameters equal to 1. If there are two A 
and/or (3 parameters, an easy choice is to set each to the square root of the 
sample mean. 

The last item presented is the probability generating function, 
P(z)=E[z n ]. 

B.2 THE (a, b, 0) CLASS 

The distributions in this class have support on 0,1 ， ... • For this class ， a* 
particular distribution is specified by setting po and then using p k = (a-\- 
b/k)pk-i^ Specific members axe created by setting po t a, and b. For any 
member, M(i) = (a+6)/(l-a), and for higher j, M ⑺ =(aj+fe)M(j_i)/(l - a). 
The variance is (a + 6)/(1 — a) 2 . 

B.2.1.1 Poisson—X 

jPo = e 入 ， cl ―― Oj b --― Aj 

» = Hfc! -J 
E[N] = A, Vax[iV] = A, 

A = /i, 

P(z) = e 入 ( 2 - 

B.2.1.2 Geometric—/3 

P。= 击，《 =為 ， b = 0, 

_ p k 
Pk ( 1 + 时 + 1 ’ 

E[N] = 13, Vai[N]=0(l + /3), 

冷 = A ， 

P(z) = [l-Piz-l)]- 1 . 


THE (a, b, 1) CLASS 645 


This is a special case of the negative binomial with r = 1. 
B.2.1.3 Binomial—q, m,(0 < q < 1, m am integer) 

Po = (1 - 9) m , a = -j^, b= ^ x + _ l \ 

Pk = 0(1-q，' fc = 0 ， l ， ... ， m, 

E[JV] = mg, Var[iV] = mq(l — q), 

Q = A / m ， 

p{z) = [i+mr 

B.2.1.4 Negative binomial —■/3, r 

M f 3 h 卜收 

Po = ( 1 + 々） ， a = TT^ ， i= T+T' 


r(r + 1 ) … (r + fc — l)j3 k 
Pk ^ - fc!(l + ， 

E[N] = Vai[N]=rP{l + l3l 



P(z) = [1 — /3 (之一 l)]— r . 


B.3 THE (a, 6,1) CLASS 

To distinguish this class from the (a, &, 0) class, the probabilities are denoted 
Pr(iV = k) — p^ 1 or Pv(N =： k) = depending on which subclass is being 
represented. For this class, p^ 1 is arbitrary (that is, it is a parameter) and 
then Pi 1 or is a specified function of the parameters a and b. Subsequent 
probabilities axe obtained recursively as in the (a, 6 , 0 ) class: pjf = (a + 
6 /fe)p^L l5 k = 2,3,..., with the same recursion for pi There axe two sub¬ 
classes of this class. When discussing their members, we often refer to the 
“corresponding^ member of the (a, 6,0) class. This refers to the member of 
that class with the same values for a and b. The notation pk will continue to 
be used for probabilities for the corresponding (a, 6 , 0 ) distribution. 

B.3.1 The zero-truncated subclass 

The members of this class have pf = 0 and therefore it need not be estimated. 
These distributions should only be used when a value of zero is impossible. 
The first factorial moment is /x ⑴ =(a + b)/[(l — a)(l — p。)]，where po is the 
value for the corresponding member of the (a, 6 ,0) class. For the logarithmic 
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distribution (which has no corresponding member), " ⑴ =j0/ln(l-f/3). Higher 
factorial moments are obtained recursively with the same formula as with the 
(a, 6 ,0) class. The variance is (a + 6)[1 — (a + 6 + l)po]/[(l — a)(l — po)] 2 *For 
those members of the subclass which have corresponding (a, 6 , 0 ) distributions, 
Pk = Pk/0- ~Po)- 

B.3.1.1 Zero-truncated Poisson — A 


E[N]= 入 /(1-e - 入 ) ， Vax[iV] = A[1 ~ (A + l)e- A ]/(l - e~ A ) 2 , 
A = ln(n/t/ni), 


B.3.1.2 Zero-truncated geometric—(3 


^ - T 


n 1 + /3 ， \+p' ’ 

n T 广 1 
Pk — (l + /3) fc, 

mi = 1 + /3, Var[iV]=^(l + /3), 

0 = A — 1 ， 

[ — 1-(1 + 用 -1 . 

This is a special case of the zero-truncated negative binomial with r 


B.3.1.3 Logarithmic—/3 


— (1 + 灼 ln(l + ； 0 )’ “ — 

T _ __ 

Pk k(l+P) k ]n^+P) > 

[iV] = p/]n{l+p), Vai[N]= 


/3[l + ^-/3/In(l + /?)] 


This is a limiting case of the zero-truncated negative binomial as r —» 0. 
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B.3 丄 4 Zero-truncated binomial — q, m, (0 < g < 1, m a/? integer) 

Pl 1 - (1 -g) m 1 -q l~q 

刚 = 

— mg[(l-g)-(l-g + mg)(l- g H 

Yai[N] ~ [ 1 -( 1 - 們 2 ， 

- A 
9 = 

p( ， . [l + q(z-l)r-(l-qr 

B.3.1.5 Zero-truncated negative binomial—(3^ r, (r > 一 1 ， r ^ 0) 

T V0 13 L (T—1)/3 

Pl = (i+py+^-(i+i3y a ~i+p , i+i3 1 

T r{r + l)---(r + k-l) ( 0 \ k 

Pk ^ fe![(l + /3)--l] \1 + (3J ， 

= i-aT^ 

Var[N] = - [WTT^PF ’ 

. P(z] - [i-^-i)r-(i+^- r 

增 - 1 -( 1 + ^ • 

This distribution is sometimes called the extended truncated negative bi¬ 
nomial distribution because the parameter r can extend below 0 . 


B.3.2 The zero-modified subclass 

A zero-modified distribution is created by starting with a truncated distri¬ 
bution and then placing an arbitrary amount of probability at zero. This 
probability, is a parameter. The remaining probabilities are adjusted 
accordingly. Values of pjf can be determined from the corresponding zero- 
truncated distribution as pjf == (1 一 Of or from the corresponding (a, 6 , 0 ) 
distribution as p^ r = (1 - Po f )pk/0- - Po)* The same recursion used for the 
zero-truncated subclass applies. 
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The mean is 1 — times the mean for the corresponding zero-truncated 
distribution. The variance is 1 — Pq 1 times the zero-truncated variance plus 
Po^O—Po 1 ) times the square of the zero-truncated mean. The probability gen¬ 
erating function is P M {z) = p^ 1 + (1 -Po^Piz), where P(z) is the probability 
generating function for the corresponding zero-truncated distribution. 

The maximum likelihood estimator of p^ 1 is always the sample relative 
frequency at 0 . 

B.4 THE COMPOUND CLASS 

Members of this class axe obtained by compounding one distribution with 
another. That is, let AT be a discrete distribution, called the primary distri¬ 
bution and let Mi, M 2 , … be identically and independently distributed with 
another discrete distribution, called the secondary distribution. The com¬ 
pound distribution is 5 = MiH - hMjv. The probabilities for the compound 

distributions axe found from 

1 ^ 、 

Pk = : j —■—r + b v/ k )fyVk-y 

1 _ aSo y=i 

for n = 1 , 2 ,..where a and b are the usual values for the primary distribution 
[which must be a member of the (a, 6 , 0 ) class] and f y is p y for the secondary 
distribution. The only two primary distributions used here are Poisson (for 
which pq = exp[—A (1 _ /o)]) and geometric [for which po = 1/[1 + P — /3/o]]. 
Because this information completely describes these distributions, only the 
names and starting values are given below. 

The moments can be found from the moments of the individual distribu¬ 
tions: 

E[S] = E[N]E[M] and Var[5] = E[N] Var[M] + Vax[iV]E[M] 2 . 

The probability generating function is P(z) = P pr imary [-Psecoadary (z)]. 

In the following list the primary distribution is always named first. For the 
first, second，and fourth distributions, the secondary distribution is the (a, 6 , 0 ) 
class member with that name. For the third and the last three distributions 
(the Poisson — ETNB and its two special cases) the secondary distribution is 
the zero-truncated version. 

B.4.1 Some compound distributions 

B.4.1.1 Poisson-binomial ~A, m(0 < g < 1, m an integer) 

.a- 2 /A - 1 又 A - nr 7 2/1 

q = - , A = — x or q ~ 0.5，A = — . 

m — 1 mq m 


THE COMPOUND CLASS 649 


B.4.1.2 Poisson-Poisson—Xij A 2 The parameter A x is for the primary Pois¬ 
son distribution, and A 2 is for the secondary Poisson distribution. This dis¬ 
tribution is also called the Neyman Type A. 

Ai = A2 = Va* 

B.4.1.3 Geometric-extended truncated negative binomial—(3^ /3 2 , r(r> 一 1) 
The parameter P 1 is for the primary geometric distribution. The last two 
parameters are for the secondary distribution, noting that for r = 0 the sec- 
ondaxy distribution is logarithmic. The truncated version is used so that the 
extension of r is available. 

為 1 = 瓦 =\/ A - 

B.4.1.4 Geometric - Poisson ― /3, A 

)8 = A = y/pL. 

B.4.1.5 Poisson-extended truncated negative binomial — A, ^9, (r > —1, r ^ 
0) When, r = 0 the secondary distribution is logarithmic, resulting in the 
negative binomial distribution. 


A(ii ： -35- 2 +2A)-2(a 2 -A) 2 

A(if-3^ 2 + 2A)-(a- 2 -A) 2 


B o - 2 -fi ^ A 


a 2 ni/n — /t 2 no/n 

( 斤 2 - fL 2 )(n 0 /n) ln(n 0 /n) - jUfino/n- ni/nY 

a- A 

P -Hl + rV X ~f0 


OO 1 OO 

JT = 土 E fc V •- 3A - + 2/i 3 . 

打 fc =0 ^ k=0 

This distribution is also called the generalized Poisson—Pascal. 
B.4.1.6 Polya — Aeppli — A, )8 


This is a special case of the Poisson-extended truncated negative binomial 
with r — 1. It is actually a Poisson—truncated geometric. 





B.4.1.7 Poisson-inverse Gaussian — A,/3 


A = — ln(no/n), 0= —- 又 ) . 

f 1 

This is a special case of the Poisson-extended truncated negative binomial 
with r = —0.5. 


B.5 A HIERARCHY OF DISCRETE DISTRIBUTIONS 

The following table indicates which distributions are special or limiting cases 
of others. For the special cases, one parameter is set equal to a constant to 
create the special case. For the limiting cases, two parameters go to infinity 
or zero in some special way. 


Distribution 

Is a special case of 

Is a limiting case of 

Poisson 

ZM Poisson 

Negative binomial, 

ZT Poisson 

ZM Poisson 

Poisson-binomial, 
Poisson-inv. Gaussian, 
Polya-Aeppli, 
Neyman-A 

ZT negative binomial 

ZM Poisson 

Geometric 

Negative binomial 

ZM negative binomial 
Geometric-Poisson 

ZT geometric 

ZM geometric 

Logarithmic 

ZM logarithmic 

Binomial 

ZM geometric 

ZT negative binomial 
ZM negative binomial 

ZM binomial 

ZT negative binomial 
ZM negative binomial 

Poisson-inverse Gaussian 
Polya - Aeppli 

Neyman-A 

Poisson-ETNB 

Poisson - ETNB 

Poisson-ETNB 


Appendix C 
Frequency and severity 
relationships 


Let N L be the number of losses random variable and let X be the severity 
random variable. If there is a deductible of d imposed, there are two ways to 
modify X. One is to create Y L ^ the amount paid per loss: 
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number of payments. Let this variable be N p • Assume that for each loss the 
probability is = 1 — Fx{d) that a payment will result. Further assume that 
the incidence of making a payment is independent of the number of losses'. 

Then N p = Li 4- L 2 H - 1- Ljv, where Lj is 0 with probability 1 — v and 

is 1 with probability v. Probability generating functions yield the following 
relationships: 


N L 

Parameters for N p 



Bi 









and therefore v is available. It is possible to recover the distribute 
iV L , although there is no guarantee that reversing the process will proc 
legitimate probability distribution. The solutions are the same as above 




now v = 丄 / i 丄一 rx\^)\> 

Now suppose the current frequency model is N d , which is appropriate for 
1 deductible of d. Now suppose the deductible is to be changed to d*. The 
aew frequency for payments is N d * and is of the same type. Then use the 
table with v = [l~ F x (d*)]/[l - F x (d)]. 


Appendix D 
The recursive formula 


The recursive formula is (where the frequency distribution is a member of the 
(a, 6,1) class), 

[pi - ( a + b )Po]fx{x) 5] ( a + 警 ) fx(y)fs(x - y) 

fs{x) = l^afxM ， 

where fs(x) = Pr(5 = x), x = 0,1,2,..., fx{x) = Pv(X = x), a: = 0,1,2,..., 
p 0 = p r (iV = 0), and pi = Pr(iV = 1). Note that the severity distribution 
(X) must place probability on non-negative integers. The formula must be 
initialized with the value of fs(Q). These values axe given in Table D.l. It 
should be noted that, if iV is a member of the (a, 6 ,0) class, pi — (a + b)po = 0 
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Table D.l 

Starting values (fs(0)) for recursions 

Distribution 

觸 

Poisson 

exp[A(/ 0 - 1)] 

Geometric 

[1 + ^1-Zo)]- 1 

Binomial 

[i + g(/o-i)] m 

Negative binomial 

[l + ^(l-/o)r 

ZM Poisson 


ZM geometric 

Po +(1 Po) 1 + /3(1 _ fo) 

ZM binomial 

p0 M + (1 _ 渚 

ZM negative binomial 

Po M + d p¥) [1+P{1 [ f ^~^ +prr 

ZM logarithmic 

滹 + (1 — 坤-迚譜 ..司 M} 


Appendix E 
Discretization of the 
severity distribution 


There axe two relatively simple ways to discretize the severity distribution. 
One is the method of rounding and the other is a mean-preserving method. 

E.l THE METHOD OF ROUNDING 

This method has two features: All probabilities are positive, and the proba¬ 
bilities add to 1. Let h be the span and let Y be the discretized version of X. 
If there are no modifications, then 

fj = Pr(F = jh) = Pr [(j -l)h<X <{j + ^)h] 

Fx [{j + ^)h]~ F x [{j - i) ft] • 
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The recursive formula is then used with fx(j) = fj- Suppose a deductible of 
d，limit of and coinsurance of a are to be applied. If the modifications axe 
to be applied before the discretization, then • 

F x [d + h/2 、 一 F x {d) 
l-F x {d) ， 

Fx\d+jl± 1/m -F x [d+(j- l/2)h] 

l-Fx{d) ， 


9o = 
9j = 


. u — d, 

3 = 1 ，… ，丁 
1 一 Fx(u — h/2) 
9(u-d)/h — 1-F x {d ) — 


where gj = Pv(Z = jah) and Z is the modified distribution. This method 
does not require that the limits be multiples of h but does require that u—dbe 
a multiple of h. This method gives the probabilities of payments per payment. 

Finally, if there is truncation from above at ti, change all denominators to 
Fx(u) —Fx(d) and also change the numerator of g( u ^d)/h to Fx{u) — Fx(u — 
h/2). 


E.2 MEAN PRESERVING 


This method ensures that the discretized distribution has the same mean as 
the original severity distribution. With no modifications the discretization is 


, 2E[X 八 jft] - E[X 八 (j - I#] — E[X 八 (j + l)/i] , , 0 

Jj = : - ^ - > J = i ，A - 

For the modified distribution, 

, E[XAd + h]-E[XAd\ 

90 ^l-i^Cd)] , 

2E[X Ad + jh] -E[XAd+(j-l)ft] - E[X A d + {j + l)h] 
9j = h[l-F x (d)] ’ 


E[XAu]-E[XAu-/i] 
9(u ~ d),h = /i[l-i^(d)] _ 

To incorporate truncation from above, 


the denominators to 


h[F x (u) - F x (d)] 

and subtract h[l — Fx{u)] from the numerators of each of go and g( u 一 d)/h. 


UNDISCRETIZATION OF A DISCRETIZED DISTRIBUTION 657 


E_3 UNDISCRETIZATION OF A DISCRETIZED DISTRIBUTION 

Assume we have^o = Pr(5 = 0), the true probability that the random variable 
is zero. Let pj - Pr(«S* = jh), where S* is a discretized distribution and h 
is the span. The following axe approximations for the cdf and LEV of S, 
the true distribution which, was discretized as S*. They axe all based on the 
assumption that S has a uniform distribution over the interval from to 

()+ l)h for integral j. The first interval is from 0 to h/2, and the probability 
Po ~~ 9o is assumed to be uniformly distributed over it. Let 5** be the random 
variable with this approximate mixed distribution. (It is continuous, except 
for discrete probability go at zero.) The approximate distribution function 
can be found by interpolation as follows. First, let 

Fj = Fs** [{j + *)"]= J = 0,1,.... 

i-0 

Then, for x in the interval (j — \)h to (j 4 - |)/i, 


F s ^(x) = 巧一 i + / h~ x V3dt = + [a; - (j — I) /i] h^ l Pj 

J(j—l/2、h 

=A 一 l +[x- (j - I) h] h~ x {Fj - Fj-i) 
x . i 

=(1 — w)Fj^i + wFj, w = — — j -b 2: 

Because the first interval is only half as wide, the formula for 0 < x < h/2 is 

2x 

F s ^ (x) = (1- 扣)卯 + wp Q , w^=—. 

It is also possible to express these formulas in terms of the discrete proba¬ 
bilities: 


2x 

9o + 一如 L 


-Pi ， O' -\)h<x< {j + j)h. 


With regard to the limited expected value, expressions for the first and kth 
LEVs axe 


E(5**Ax) = ^ j{po-9o)-hY^ihpi^r 


U -\)h<x< (j + \)h 
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Appendix F 
Numerical optimization 
and solution of systems 
of equations 


Maximizing functions can be difficult when there axe many variables. A vari¬ 
ety of numerical methods have been developed, and most any will be sufficient 
for the tasks set forth in this text. Here we present two options. The first is 
to use the Excel® Solver add-in. It is fairly reliable, though at times it may 
declare a maximum has been found when there is no maximum. A second 
option is the simplex method. This method tends to be slower but is more 
reliable. The final section of this Appendix shows how the solver and goal 
seek routines in Excel® can be used to solve systems of equations. 
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joint distribution, 362 
loss function, 364 
marginal distribution, 362 
model distribution, 361 
posterior distribution, 362 
predictive distribution, 362, 545 
prior distribution, 360 
Beta distribution, 641 
Beta function, incomplete, 629 
Bias, 268 

Binomial-beta distribution, 102 
Binomial distribution, 79, 645 
estimation, 389 
Bivariate distribution, 402 
Brownian motion, 252 
relationship to min, 256 
with drift, 253 

Biihlmann credibility model, 557 
Btihlmann-Straub credibility model, 560 
Burr distribution, 632 

C _ — 

Censoring 
from above, 297 
from below, 297 
left, 297 
right, 297 

Central limit theorem, 36 
Bayesian, 367 
Central moment, 27 
Characteristic function, 105 
for aggregate loss, 142 
Chi-square goodness-of-fit test, 432 
Claim count random variable, 137 
Closed under convolution, 156 
Coefficient of variation, 27 
Coinsurance, 126 
Collective risk model, 135 
Complete expectation of life, 29 
Compound distribution 
for aggregate loss, 141 
frequency, 88 

Compound frequency distribution, 648 
estimation, 396 

Compound geometric-exponential 
distribution, 154 

Compound Poisson frequency distribution, 
95 

Compound Poisson process, 225 
Conditional distribution, 518 
Confidence interval, 275, 309 
log-transformed, 313-314 
Conjugate prior distribution, 373 
Consistency, 270 

maximum likelihood estimator, 352 
Construction of mortality tables, 322 


Continuous random variable, 16 
Continuous-time process, 210 
Convolution, 140 
numerical, 474 
Copula, 403 

Counting distributions, 72 
Covariates 
models with, 405 
proportional hazards model, 406 
Cox proportional hazards model, 407 
Cramer’s asymptotic ruin formula, 244 
Credibility 

Btihlmann credibility factor, 558 
expected hypothetical mean, 557 
expected process variance, 557 
fully parametric, 590 
greatest accuracy, 517, 542 
Bayesian, 545 
Btihlmann, 557 
Biihlmann-Straub, 560 
exact credibility, 566 
fully parametric, 602 
linear, 553 
linear vs. Bayes, 569 
log-credibiiity, 573 
nonparametric, 592 
semiparametric, 600 
hypothetical mean, 557 
limited fluctuation, 516, 530 
full credibility, 532 
partial credibility, 535 
nonparametric, 590 
partial, 535 
process variance, 557 
semiparametric, 590 
variance of the hypothetical means, 557 
Credibility factor, 535 
Cubic spline, 489 

Cumulative distribution function, 13 

Cumulative hazard rate function, 289 

D 

Data-dependent distribution, 45, 284 
Deductible 
effect of inflation, 122 
effect on frequency, 129 
franchise, 118 
ordinary, 116, 297 
Delta method, 356 
Density function, 17 
Density function plot, 423 
Difference plot, 424 
Digamma function, 588 
Discrete distribution, 72 
Discrete Fourier transform, 185 
Discrete random variable, 16 


Discrete time process, 211, 215 
Distribution 

(a, 6, 0) class, 81, 644 
(a, b, 1) class, 83, 645 
aggregate loss, 137 
beta, 641 

binomial-beta, 102 
binomial, 79. 645 
bivariate, 402 
Burr, 632 
claim count, 137 
compound, 141 
moments, 142 

compound frequency, 88, 648 
recursive formula, 92 
compound Poisson frequency, 95 
conditional, 518 
conjugate prior, 373 
copula, 403 

counting distributions. 72 
data-dependent, 45, 284 
defective, 260 
discrete. 72 
empirical, 284 
exponential, 638 

exponential dispersion family, 581 
extended truncated negative binomial 
(ETNB), 87 
frailty, 62 
frequency, 137 
gamma, 48, 58, 636 
generalized beta, 640 
generalized Pareto, 631 
generalized Poisson-Pascal, 649 
generalized Waring, 103, 379 
geometric-ETNB, 649 
geometric-Poisson, 649 
geometric, 77, 644 
improper prior, 361 
individual loss, 137 
infinitely divisible, 104 
inverse Burr, 632 
inverse exponential, 638 
inverse gamma, 636 
inverse Gaussian, 48, 639 
inverse paralogistic, 635 
inverse Pareto, 633 
inverse transformed，57 
inverse transformed gamma, 71， 635 
inverse Weibull, 58, 637 
joint, 362, 518 
Appoint mixture, 43 
kernel smoothed, 284 

exponential family, 371 
hmic, 87, 646 
loglogistic, 66, 634 


lognormal, 59, 71, 638 
log-t, 639 
Makeham, 458 
marginal, 362, 518 
mixed frequency, 101 
mixture, 519 
mixture/mixing, 43, 59 
negative binomial, 76, 645 
as Poisson mixture, 78 
extended truncated, 87 
negative hypergeometric, 102 
Neyman Type A, 90, 649 
one-sided stable law, 261 
paralogistic, 634 
parametric, 41, 284 
parametric family，42 
Pareto, 633 
Poisson-binomial, 648 
Poisson — inverse Gaussian, 650 
Poisson-Poisson, 90, 649 
Poisson-ETNB, 649 
Poisson-extended truncated negative 
binomial, 398 

Poisson-inverse Gaussian, 398 
Poisson-logarithmic, 94 
Poisson, 73, 644 
Polya-Aeppli, 649 
Polya-Eggenberger, 102 
posterior, 362 
predictive, 362, 545 
prior, 360 
scale, 41 
Sibuya, 88 

single parameter Pareto, 640 
spliced, 64 
tail weight, 48 
transformed, 57 
transformed beta, 66, 631 
transformed beta family, 69 

transformed gamma, 70, 635 
transformed gamma family, 69 
variable-component mixture, 44 
Waring, 103, 379 
Weibull, 58, 637 
Yule, 103, 379 
zero-modified, 85, 647 
zero-truncated, 85 
zero-truncated binomial, 647 
zero-truncated geometric, 646 
zero-truncated negative binomial, 647 
zero-truncated Poisson, 646 
zeta, 111, 451 

Distribution function, 13 
empirical, 288 

Distribution function plot, 421 
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Empirical Bayes estimation, 589 
Empirical distribution, 284 
Empirical distribution function, 288 
Empirical model, 27 
Estimate 
interval, 309 
Nelson - Aalen, 290 
Estimation 

(a, 6, 1) class, 392 

Bayesian, 360 

binomial distribution, 389 

compound frequency distributions, 396 

credibility interval, 365 

effect of exposure, 398 

empirical Bayes, 589 

maximum likelihood, 337 

multiple decrement tables, 324 

negative binomial, 386 

point, 266 

Poisson distribution, 383 
Estimator 

asymptotically unbiased, 270 
Bayes estimate, 364 
, bias, 268 

confidence interval, 275 
consistency, 270 
interval, 275 
Kaplan-Meier, 299 
kernel density, 316 
mean-squared error, 271 
method of moments, 332 
percentile matching, 333 
relative efficiency, 275 
smoothed empirical percentile, 333 
unbiased, 268 

uniformly minimum variance unbiased, 



Exact credibility, 567 
Excess loss variable, 29 
Expectation, conditional, 520 
Exponential distribution, 638 
Exposure base, 108 
Exposure, effect in estimation, 398 
Extrapolation, using splines, 504 

F 

Failure rate, 20 
Fast Fourier transform, 186 
Fisher’s information, 352 
Force of mortality, 20 
Fourier transform, 185 
Frailty model, 62 
Franchise deductible, 118 
Frequency, 137 


effect of deductible, 129 
interaction with severity, 651 
Frequency / severity interaction, 174 
Full credibility，532 
Function 

characteristic, 105 
cumulative hazard rate, 289 
density, 17 

empirical distribution, 288 
force of mortality, 20 
gamma, 58, 630 
hazard rate, 20 
incomplete beta, 629 
incomplete gamma, 58, 627 
likelihood, 338 
loglikelihood, 339 
loss, 364 

probability, 19, 73 
probability density, 17 
probability generating, 73 
survival, 16 

G 

Gamma distribution, 58, 636 
Gamma function, 58, 630 
incomplete, 627 
Gamma kernel, 318 
Generalized beta distribution, 640 
Generalized linear model, 413 
Generalized Pareto distribution, 631 
Generalized Poisson-Pascal distribution, 
649 

Generalized Waring distribution, 103, 379 
Generating function 
moment, 36 
probability, 36 

Geometric-ETNB distribution, 649 

Geometric — Poisson distribution, 649 

Geometric distribution, 77, 644 
Greatest accuracy credibility, 517, 542 
Greenwood’s approximation, 311 

ii 

Hazard rate, 20 
cumulative, 289 
tail weight, 50 

Heckman-Meyers formula, 188 
Histogram, 293 
Hypothesis tests, 277, 427 
Anderson-Darling, 430 
chi-square goodness-of-fit, 432 
Kolmogorov-S mirnov, 428 
likelihood ratio test, 436, 442 
p-value, 280 
significance level, 278 
uniformly most powerful, 279 


Hypothetical mean, 557 


I 


Incomplete beta function, 629 
Incomplete gamma function, 58, 627 
Independent increments, 210, 224 
Individual loss distribution, 137 
Individual risk model, 136, 192 
moments, 193 
direct calculation, 195 
recursion, 197 

Infinitely divisible distribution, 104 


Inflation 

effect of, 122 
effect of limit, 124 
Information, 352 
observed, 355 
Information matrix, 353 
Interpolation 

modified oscillatory, 505 
polynomial, 485 
Interval estimator, 275 
Inverse Burr distribution, 632 
Inverse exponential distribution, 638 
Inverse gamma distribution, 636 
Inverse Gaussian distribution, 48, 639 
Inverse paralogistic distribution, 635 
Inverse Pareto distribution, 633 
Inverse transformed distribution, 57 
Inverse transformed gamma distribution, 


Tl } 635 

Inverse Weibull distribution, 58, 637 
Inversion method for aggregate loss 
calculations, 161, 184 
Joint distribution, 518 


K 

fc-point mixture distribution, 43 
Kaplan-Meier estimator, 299 
large data sets, 322 
variance, 311 

Kernel density estimator, 316 


Left truncation, 297 
Likelihood function, 338 
Likelihood ratio test, 436, 442 
Limit 

effect of inflation, 124 
policy, 298 

Limited expected value, 30 
Limited fluctuation credibility, 516, 530 
partial, 535 

Limited loss variable, 30 
Linear exponential family, 371 
Log-fc distribution, 639 
Log-transformed confidence interval, 
313-314 

Logarithmic distribution, 87, 646 
Loglikelihood fuction, 339 
Loglogistic distribution, 66, 634 
Lognormal distribution, 59, 71, 638 
Loss elimination ratio, 121 
Loss function, 364 
Lundbergs inequality, 230 

M 

Makeham distribution, 458 
Marginal distribution, 362, 518 
Markov process, 215 
Maximization, 660 
simplex method, 664 
Maximum aggregate loss, 239 
Maximum covered loss, 126 
Maximum likelihood estimation, 337 
binomial, 390 
inverse Gaussian, 350 
negative binomial, 387 
Poisson, 384 
variance, 385 

trucation and censoring, 341 
Maximum likelihood estimator 
consistency, 352 
unbiased, 352 
Mean, 25 

Mean excess loss, 29 
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individual risk, 136 
Model selection, 3 
graphical comparison, 421 
Schwarz Bayesian criterion, 443 
Modeling, advantages, 5 
Modeling process, 3 
Moment, 25 

Moment generating function, 36 
for aggregate loss, 142 
Moment 

individual risk model, 193 
factorial, 643 

limited expected value, 30 
of aggregate loss distribution, 142 
Mortality table construction, 322 
Multiple decrement tables, 324 

N 一 ' 

Negative binomial distribution, 76, 645 
as compound Poisson-logarithmic, 94 
as Poisson mixture, 78 
estimation, 386 

Negative hypergeometric distribution, 102 
Nelson-Aalen estimate, 290 
Neyman Type A distribution, 90, 649 
Noninformative prior distribution, 361 

O 

Observed information, 355 
Ogive, 293 

Ordinary deductible, 116, 297 
Oscillatory interpolation, 505 

P ~^ 

p-value, 280 

Paralogistic distribution, 634 
Parameter, 3 
scale, 41 

Parametric distribution, 41, 284 
Parametric distribution family, 42 
Pareto distribution, 633 
Parsimony, 440 
Partial credibility, 535 
Percentile, 34 
Percentile matching, 333 
Plot 

density function, 423 
difference, 424 
distribution funtion, 421 
Point estimation, 266 
Poisson—binomial distribution, 648 
Poisson-ETNB distribution, 398, 649 
Poisson-inverse Gaussian distribution, 398, 
650 

Poisson - logarithmic distribution, 94 
Poisson distribution, 73, 644 


estimation, 383 
Poisson process, 223 
Policy limit, 124, 298 
Polya-Aeppli distribution, 649 
Polya-Eggenberger distribution, 102 
Polynomial interpolation, 485 
Polynomial, collocation, 485 
Posterior distribution, 362 
Predictive distribution, 362, 545 
Prior distribution, noninformative or 
vague, 361 

Probability density function, 17 
Probability function, 19, 73 
Probability generating function, 36, 73 
for aggregate loss, 141 
Probability mass function, 19 
Process variance, 557 
Process 

Brownian motion, 252 
compound Poisson, 225 
continuous time, 2X0 
discrete time, 211, 215 
independent increments, 210, 224 
Markov, 215 
Poisson, 223 

stationary increments, 211, 224 
surplus, 212 
Weiner, 253 
white noise, 253 
Product-limit estimator, 299 
large data sets, 322 
variance, 311 

Proportional hazards model, 406 
Pseudorandom variables, 612 
Pure premium, 515 

iT~ 

Random variable 
central moment, 27 
coefficient of variation, 27 
continuous, 16 
discrete, 16 
excess loss, 29 
kurtosis, 27 

left censored and shifted, 30 
left truncated and shifted, 29 
limited expected value, 30 
limited loss, 30 
mean, 25 

mean excess loss, 29 
mean residual life, 29 
median, 34 
mixed, 16 
mode, 23 
moment, 25 
percentile, 34 


right censored, 31 
skewness, 27 
standard deviation, 27 
support, 16 
variance, 27 
Recursive formula, 653 

aggregate loss distribution, 161 
continuous severity distribution, 655 
for compound freqency, 92 
Recursive method for aggregate loss 
Calculations, 161 
Relative .efficiency, 275 
Relative security loading, 225 
Right censored variable, 31 
Right censoring, 297 
Right truncation, 297 
Risk model 
collective, 135 
individual, 136, 192 
Risk set, 289, 298 
Ruin 

asymptotic, 244 

continuous time, finite horizon, 214 
continuous time, infinite horizon, 213 
discrete time, finite horizon, 214 
discrete time, infinite horizon, 214 
luation by convolution, 216 
luation by inversion, 219 
Lundberg’s inequality, 230 
Tijms’ approximation, 244-245 
time to, as inverse Gaussian, 260 
time to, as one-sided stable law, 261 
using Brownian motion, 256 
Ruin theory, 209 

Scale distribution, 41 

Scale parameter, 41 

Schwarz Bayesian criterion, 443 

Security loading, relative, 225 

Severity, interaction with frequency, 651 

Severity / frequency interaction, 174 

Sibuya distribution, 88 

Significance level, 278 

Simplex method, 664 

Simulation, 611 

aggregate loss calculations, 618 
Single-parameter Pareto distribution, 640 
Skewness, 27 

Smoothed empirical percentile estimate, 
333 

Smoothing splines, 505 
Solver, 660 

Spliced distribution, 64 
Splines 
cubic, 489 


extraplation, 504 
smoothing, 505 
Standard deviation, 27 
Stationary increments, 211, 224 
Stop-loss insurance, 145 
Support, 16 
Surplus process, 212 
maximum aggregate loss, 239 
Survival function, 16 

T 

Tail weight, 48 

Tijms ， approximation, 244-245 
Transformed beta distribution, 66, 631 
Transformed beta family, 69 
Transformed distribution, 57 
Transformed gamma distribution, 70, 635 
Transformed gamma family, 69 
Triangular kernel, 318 
Trigamma function, 588 
Truncation 
from above, 297 
from below, 297 
left, 297 
right, 297 

U 

Unbiased, 4, 268 

maximum likelihood estimator, 352 
Uniform kernel, 318 
Uniformly minimum variance unbiased 
estimator (UMVUE), 272 
Uniformly most powerful test, 279 

V 

Vague prior distribution, 361 
Variable-component mixture, 44 
Variance, 27, 522 
conditional, 521 
delta method, 356 
Greenwood’s approximation, 311 
product-limit estimator, 311 

W ― 

Waring distribution, 103, 379 
Weibull distribution, 48, 58, 637 
Weiner process, 253 
White noise process, 253 

Y 

Yule distribution, 103, 379 
Z 

Zero-modified distribution, 85, 647 
Zero-truncated binomial distribution, 647 
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Zero-truncated geometric distribution, 646 


Zero-truncated negative binomial 
distribution, 647 

Zero-truncated Poisson distribution, 646 
Zeta distribution, 451 
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